What is the origin of our ability to learn orthographic knowledge? We use deep convolutional networks to emulate the primate's ventral visual stream and explore the recent finding that baboons can be trained to discriminate English words from nonwords . The networks were exposed to the exact same sequence of stimuli and reinforcement signals as the baboons in the experiment, and learned to map real visual inputs (pixels) of letter strings onto binary word/nonword responses. We show that the networks' highest levels of representations were indeed sensitive to letter combinations as postulated in our previous research. The model also captured the key empirical findings, such as generalization to novel words, along with some intriguing inter-individual differences. The present work shows the merits of deep learning networks that can simulate the whole processing chain all the way from the visual input to the response while allowing researchers to analyze the complex representations that emerge during the learning process.