Stefania Degaetano-Ortlieb: "Modeling and Interpreting Variation in Language Use"

Montag, 18.11.2019, 18.30 Uhr

Dr. Stefanie Degaetano-Ortlieb Urheberrecht: © Deborah de Muijnck

Dies ist der vierte Vortrag im Rahmen der Ringvorlesung.

Dieser Vortrag findet in Hörsaal H09 im C.A.R.L. (Claßenstraße 11) statt.


In this talk, I will introduce an empirical approach for the analysis of linguistic variation based on information-theoretic principles and discuss implications for research within literary studies.

Linguistic variation offers a set of encoding options from the linguistic system that allow us to modulate information based on linguistic and extra-linguistic (e.g. time, register, social variable) contextual needs. Through language use, we can directly observe linguistic choices in (linguistic and extra-linguistic) context and analyze variation across contexts. 

Adopting an information-theoretic approach, we ask (1) how much two different language uses diverge from one another, (2) what is typical of a particular language use in comparison to another, and (3) how variation allows for the modulation of informational content at different linguistic levels and across contexts.

To seek answers for (1) and (2), we use the notion of divergence, measured by relative entropy (Kullback-Leibler 1951, Fankhauser et al. 2014). Divergence allows us to compare probability distributions of linguistic features (words, grammatical units, etc.) based on extra-linguistic context (time, authors, registers). Typical linguistic features are determined by how much features contribute to a divergence in a comparison.

Moreover, considering variation to be beneficial for communication, it is assumed that linguistic choices allow us to modulate the informational content of a message according to particular contextual needs. But how is this achieved (see (3) above)? Evidence has shown that the amount of information of a message as well as processing and production effort might influence a particular choice. Assuming language users to be rational, as producers they aim to get a particular message across with adequate effort (i.e. not exceeding their production effort). Language users as comprehenders want a message to be informative in a way that is understandable to them (i.e. not exceeding a certain limit of their processing capacity) (cf. Grice 1975 on the maxim of quantity, De Beaugrande and Dressler 1981 on informativity). In fact, several studies considering different linguistic levels have shown that language producers aim to evenly distribute information over a message (Jaeger and Levy 2007, Genzel and Charniak 2002, Aylett and Turk 2004). The notion of informativity is measured by surprisal, i.e. predictability of linguistic units (e.g. words) in linguistic context (e.g. preceding words). Surprisal is measured in bits of information, i.e. the higher the surprisal, the higher the amount of information of that unit in a particular context. By analyzing surprisal profiles of particular linguistic units in linguistic context (cf. Degaetano-Ortlieb and Teich 2019 on scientific articles) or across extra-linguistic contexts (cf. Degaetano-Ortlieb and Piper 2019 on literary research articles), we aim to better understand how variation helps the modulation of informational content according to contextual needs.  

In a final discussion, I would like to highlight the possible application of the analysis of linguistic variation based on information-theoretical principles for research within literary studies.