Starting from the hypothesis that printed word identification initially involves the parallel mapping of visual features onto location-specific letter identities, we analyze the type of information that would be involved in optimally mapping this location-specific orthographic code onto a location-invariant lexical code. We assume that some intermediate level of coding exists between individual letters and whole words, and that this involves the representation of letter combinations. We then investigate the nature of this intermediate level of coding given the constraints of optimality. This intermediate level of coding is expected to compress data while retaining as much information as possible about word identity. Information conveyed by letters is a function of how much they constrain word identity and how visible they are. Optimization of this coding is a combination of minimizing resources (using the most compact representations) and maximizing information. We show that in a large proportion of cases, non-contiguous letter sequences contain more information than contiguous sequences, while at the same time requiring less precise coding. Moreover, we found that the best predictor of human performance in orthographic priming experiments was within-word ranking of conditional probabilities, rather than average conditional probabilities. We conclude that from an optimality perspective, readers learn to select certain contiguous and non-contiguous letter combinations as information that provides the best cue to word identity.