ICAME46: Per Corpora ad Astra: Exploring the Past, Mapping the Future
Welcome to the 46th ICAME (International Computer Archive of Modern and Medieval English) Conference!
Hosted by the Faculty of Philology at Vilnius University, this year’s conference brings together researchers from 25 countries to explore corpora, English, and the latest in corpus linguistics.
Over five days in June, participants will engage in keynote talks, pre-conference workshops, software demonstrations, and presentations of full papers and work-in-progress reports.
Beyond academia, the social program offers a walking tour of Vilnius Old Town, a welcome reception, a boat trip around magnificent Trakai Island Castle, and a conference dinner with a disco at the iconic 1960s-style “Neringa” restaurant.
Vilnius University’s motto is “Hinc Itur ad Astra”—while we can’t literally take you to the stars, we promise an inspiring experience!
On behalf of the ICAME46 organising committee,
Prof. Jolanta Šinkūnienė
Keynote speakers

Sebastian Hoffmann
Universität Trier
Rosa Lorés
Universidad de Zaragoza
Rūta Petrauskaitė
Vytautas Magnus University
Lukas Sönning
Universität Bamberg
Sebastian Hoffmann
Universität Trier
Sebastian Hoffmann (PhD, University of Zurich) is Professor of English Linguistics at Trier University. Before moving to Trier, he spent three years at Lancaster University (UK) as Lecturer in English Linguistics (2006 - 2009). His research has predominantly focused on the application of usage-based approaches to the study of language and includes topics such as syntactic change/grammaticalization, use and pragmatics of tag questions, and the lexico-grammar of New Englishes. He is a co-author of BNCweb, a user-friendly web-interface to the British National Corpus (BNC), which also forms the basis for his textbook publication Corpus Linguistics with BNCweb – a Practical Guide (Peter Lang, 2008; with S. Evert, N. Smith, D. Lee and Y. Berglund-Prytz). In recent years, his interests have expanded to include corpus phonetics, most prominently on the basis of the Audio BNC.
Abstract
Audio data in corpus linguistics – challenges and opportunities
Corpus linguistic analysis of audio data has traditionally lived a comparatively shadowy existence at ICAMEs, and this is particularly the case with respect to phonetic and phonological research questions. There is relatively little "conference attendance overlap" with the research community of Corpus Phonetics, for example, and few of the speech corpora (e.g. the Switchboard Corpus or the Buckeye Corpus) used by that community feature in studies by traditional ICAMErs, as these corpora tend to be smaller and more specialised collections of data that are often sampled in controlled contexts to ensure sufficient audio quality. In turn, spoken corpora used by the ICAME community such as ICE-GB or the Santa Barbara Corpus typically do not meet the requirements of corpus phoneticians, for example because they lack phonetic/phonological annotation or sufficient audio quality, or in fact because audio data is not available in the first place (e.g. the London-Lund Corpus or BNC2014).
A major source of audio data for corpus analysis became available when about 5.4 million words of the original BNC were first digitized and phonemically transcribed (see Coleman et al. 2011, Coleman et al. 2011) and later integrated into BNCweb (see Hoffmann & Arndt-Lappe 2021). The data has since been used for a range of corpus(-phonetic) analyses, including my own joint research on intrusive /r/ (Hoffmann & Arndt-Lappe 2021) and stress shift (Arndt-Lappe & Hoffmann 2022). This work has highlighted both the great potential of the data (e.g. size, variety, naturalness) and its drawbacks (e.g. audio quality, issues with forced alignment).
In my talk, I will return to the topic of stress shift, using the AudioBNC for an investigation of the Principle of Rhythmic Alternation ("PRA", Sweet 1876) and the theoretical questions that arise from such an undertaking. In particular, I will present some findings that significantly extend what was presented in Arndt-Lappe & Hoffmann (2022). In doing so, I will provide further evidence that the concept of stress shift must indeed at least partially be questioned, which in turn suggests that some basic tenets of phonological theory may require re-evaluation. I will also discuss some of the methodological challenges that researchers face when using the AudioBNC, but – probably not surprisingly – come to the conclusion that the advantages by far outweigh these difficulties and that venturing into corpus-phonetic territory is a very worthwhile undertaking for a long-term ICAMEr, too.
References
Arndt-Lappe, Sabine & Sebastian Hoffmann. 2022. "Comparing approaches to phonological and orthographic corpus formats: Revisiting the Principle of Rhythmic Alternation." In: Ole Schützler & Julia Schlüter. Eds. Comparative Approaches to Data and Methods in Corpus Linguistics. Cambridge: Cambridge University Press, 46-72.
Coleman, John, Ladan Baghai-Ravary, John Pybus and Sergio Grau. 2012. "Audio BNC: The Audio Edition of the Spoken British National Corpus." Oxford: Phonetics Laboratory, University of Oxford.
Coleman, John, Mark Y. Liberman, Greg Kochanski, Lou Burnard and Jiahong Yuan. 2011. "Mining a Year of Speech. Paper Presented at VLSP 2011: New Tools and Methods for Very-Large-Scale Phonetics Research." University of Pennsylvania, 29–31 January 2011.
Hoffmann, Sebastian & Sabine Arndt-Lappe. 2021. "Better Data for More Researchers: Using the Audio Features of BNCweb." ICAME Journal 45. 125–54.
Sweet, Henry. 1876. "Words, Logic, and Grammar." In Transactions of the Philological Society, 1875–1876. 470–503.