Sentence frequency norms for psycholinguistic studies

authors

Dufau Stéphane
Armando Marjorie
Grainger Jonathan

document type

POSTER

abstract

Capitalizing on the Google’s Ngram corpus, we examined the possibility to establish frequency norms for sentences in a format suitable for psycholinguistics. Ngram corpus is based on over 8 million digitized books and reflects how often sequences of (N-)words are used in a particular language (8 languages available to date). Even though publicly available, raw data is presented in a form that is difficult to use as is. Frequency is split by year and corpus split into multiple files. In addition, sequences tagged with part-of-speech are mixed with non-tagged ones. Here, we propose a simplified and curated version of the Ngram frequency norms that will help in stimulus selection for psycholinguistic studies. Such curated norms are used in a pilot study to assess whether or not sentence frequency plays a role in sentence recognition.

more information

Formulaire de recherche

Sentence frequency norms for psycholinguistic studies

authors

document type

abstract