Man and Machine during Natural Language Processing : A Neurocognitive Approach

Date: 
Vendredi, 2 Décembre, 2016 - 16:00
Date fin: 
Vendredi, 2 Décembre, 2016 - 18:00

16 heures, salle des Voûtes
Chris Biemann and Markus J. Hofmann
Language Technology, Universität Hamburg
General and Biological Psychology, University of Wuppertal
Man and Machine during Natural Language Processing : A Neurocognitive Approach

While state-of-the-art NLP models lack a theory that systematically accounts for human performance at all levels of linguistic analysis, Neurocognitive Simulation Models of orthographic and phonological memory so far lacked a level of implemented semantic representations. To overcome these limitations, the authors of this talk decided to initiate a long-standing cooperation.
In part 1 of this talk, we introduce unsupervised methods from language technology that capture semantic information. We present a range of methods that extract semantic representation from corpora, as opposed to using manually created norms. We show how we applied language models based on n-grams, topic modelling, and the word2vec neural model across three different corpora to account for behavioral, brain-electric and eye movement data. We used a benchmark that has become standard for Neurocognitive Simulation Models in psychology : Thus we reproducibly accounted for half of the item-level variance in the cloze-completion-based word predictability from sentence context, and the resulting N400-, and single fixation duration data of the Potsdam sentence corpus.
In part 2 we discuss how relatively straightforward NLP methods can be used to define semantic processes in a neurocognitive simulation model. To extend an interactive activation model with a semantic layer, we used the log likelihood that two words occur more often together in the sentences of a large corpus than predictable by single-word frequency. The resulting Associative Read-Out Model (AROM) is an extension of the Multiple Read-Out Model. Here, we use it to account for association ratings and semantically induced false memories in human performance and P200/N400 brain-electric data. Then, we present a sequential version of the AROM accounting for primed lexical decision, and the resulting semantic competition in the left (and right !) inferior frontal gyrus of the human brain. Finally, we envision two routes of reading, complementing the form-based aspects of linguistic representations with one of the most defining feature of words : they carry meaning.