ICAME46: Per Corpora ad Astra: Exploring the Past, Mapping the Future

17-21 June 2025 Faculty of Philology

Welcome to the 46th ICAME (International Computer Archive of Modern and Medieval English) Conference!

Hosted by the Faculty of Philology at Vilnius University, this year’s conference brings together researchers from 25 countries to explore corpora, English, and the latest in corpus linguistics.

Over five days in June, participants will engage in keynote talks, pre-conference workshops, software demonstrations, and presentations of full papers and work-in-progress reports.

Beyond academia, the social program offers a walking tour of Vilnius Old Town, a welcome reception, a boat trip around magnificent Trakai Island Castle, and a conference dinner with a disco at the iconic 1960s-style “Neringa” restaurant.

Vilnius University’s motto is “Hinc Itur ad Astra”—while we can’t literally take you to the stars, we promise an inspiring experience!

On behalf of the ICAME46 organising committee,

Prof. Jolanta Šinkūnienė

Keynote speakers

Sebastian Hoffmann

Universität Trier

Rosa Lorés

Universidad de Zaragoza

Rūta Petrauskaitė

Vytautas Magnus University

Lukas Sönning

Universität Bamberg

Rūta Petrauskaitė

Vytautas Magnus University

Rūta Petrauskaitė is a professor at the Department of Lithuanian Studies. Currently she acts as the director of the Institute of the Digital Resources and Interdisciplinary Research (SITTI) at Vytautas Magnus University.

In the last decade she has been a vice-president of the Research Council of Lithuania and the Chair of the Committee of Social Sciences and Humanities. Internationally she got involved in the activities of the Common Language Resources and Technology Infrastructure (CLARIN), Science Europe Research Data working group and European Open Science Cloud (EOSC).

Her research interests comprise a range of topics from linguistics to discourse analyses. She initiated and supervised compilations of the first big corpora of the Lithuanian language and corpus based research in a few fields of linguistics.

Rūta Petrauskaitė is a proponent of data-driven research, Open Science and data sharing initiatives.

https://www.vdu.lt/cris/entities/person/ruta-petrauskaite

Abstract

Corpora and Data. From John Sinclair to Artificial Intelligence

Last year we celebrated thirty years anniversary of corpus linguistics in Lithuania. The advent of the new trend was gradual, nevertheless, groundbreaking. Our participation in EU projects TELRI I and TELRI II speeded up compilation of the first corpora for the Lithuanian language that was followed by corpus-based research. To deal with corpora we badly needed new methodological approaches, happily, by that time they were already available in publications by John Sinclair as well as his activities related to COBUILD. TELRI was beneficial due to co-operation with linguists from other countries but most of all due to the revolutionary ideas and personality of John Sinclair.

John Sinclair was ahead of time in his attempts to describe how meaning is created in human language. His holistic approach is based on a few key concepts of lexical items juxtaposed to ortographic words or extended units of meaning, comprising elements of lexis (collocation), grammar (colligation), semantics (semantic preference) and pragmatics (attitudinal meaning). His effort to do away with the historical split of lexis and grammar and to show the close relation between the two types of pattern more than thirty year ago was truly astonishing.

Main cornerstones of his language theory included: a) reunification of grammar as structure and lexis as vocabularly for a language for creation of meaning in text, i.e., form and meaning in language that cannot be separated; b) the importance of co-text and context for generating and understanding the meaning; c) reliance on corpora as large amounts of language data for pattern detection instead of testing hypothesis, i.e., corpus-driven instead of corpus based approach; d) reluctance to trust man-made consensus grammar based annotation.

John Sinclair passed away in 2013, before neuronic revolution, so he did not witness its main developments that went along the same lines as he suggested for corpus linguistics. Major steps in the direction of AI were as follows: 1990 marked the shift from rule- to statistics-based methods and machine learning. 2014 brought neuronic language technologies, that caused a major paradigm shift in natural language processing, specifically the shift from rule-based approaches to data-driven approaches. The focus has increasingly moved toward high-quality corpus modelling rather than relying on explicit grammar rules or predefined linguistic annotations. Large language models like GPT, released three years ago represent this evolution: they were fundamentally data-driven but increasingly incorporating techniques to inject linguistic knowledge and structure where it is beneficial. High-quality corpora enabled models to learn language patterns effectively and this is how AI learned languages – by encopassing broad co-text and capturing the richness and complexity of natural language.

ICAME46: Per Corpora ad Astra: Exploring the Past, Mapping the Future

Keynote speakers

Sebastian Hoffmann

Rosa Lorés

Rūta Petrauskaitė

Lukas Sönning

Partners

Rūta Petrauskaitė