15th Teaching and Language Corpora (TaLC) Conference

13th – 16th July 2022 

University of Limerick, Ireland


Pre-conference workshops will take place on 13th July.

Workshop Descriptions:

Workshop 1) Analysing learner performance in terms of errors and communicative strategies (Full Day Workshop)
Convenor: Martin Weisser
Based on the concepts discussed in my TaLC 2019 paper, this workshop will introduce you to two different approaches to analysing learner performance in transcribed and annotated spoken interactional data, pragmatic and error analysis.

Of the two types of linguistic annotation discussed here, pragmatic annotation is arguably the most complex and least understood form because it involves bringing together information from different linguistic levels in order to be able to infer the communicative intentions and strategies of a speaker. The first part of the workshop hence aims to allow you to understand the complexities of pragmatic annotation, its potential uses in creating communicative profiles for individual learners or learner groups, as well as how to carry it out with relative ease in a tool that I’ve developed over the course of more than 15 years, the Dialogue Analysis & Research Tool (DART).

The second part will introduce you to a novel error analysis scheme I created in order to be able to combine the two approaches to gain a more comprehensive picture of learner strategies and performance. Although, on the surface, classifying errors would seem much easier than interpreting communicative intentions, here, we’ll still need to concern ourselves with interpreting the type and potential severity of the errors thoroughly, as well as to develop suitable target hypotheses that, in turn, may allow us to judge the learners’ performance better in terms of strengths and difficulties.

Befitting the nature of the conference, throughout the workshop, we’ll of course also consider implications of our potential analysis results for teaching. And if you’d like me to integrate some of your own data into the workshop, feel free to contact me at weissermar@hotmail.com as early as possible so we can discuss options.

Target audience:

This workshop is intended for advanced-level researchers with an interest in both corpus pragmatics and error analysis for learner corpora.

Workshop Outline

Part 1

·         DART: History & Genesis

·         Refresher on Linguistic Annotation

·         The DART XML Format

·         Converting Data to the DART XML Format

·         Pre-processing Dialogue Data for Annotation

·         Creating Lexica for Analysis

·         Creating Semantic Resources

·         Annotating Dialogue Data in DART

·         Understanding the Annotation Results

·         Post-processing & Correcting Annotated Data

·         Analysing Speech Acts & Creating Pragmatic Profiles

Part 2

·         Introduction to the Error Analysis Scheme

·         Interpreting & Annotating Errors

·         Analysing Errors to Create Error Profiles

Technical Requirements:

Participants should bring their own computers to the workshop. Unfortunately, the current version of DART still only works on Windows, so if you own a Mac, you may need to run it through WINE.


Workshop 2) A Multimodal Analysis of Virtual Workplace Discourse: From corpus to classroom applications (Half Day Workshop - MORNING)

Geraldine Mark, Chris Fitzgerald, Justin McNamara, Dawn Knight, Anne O’Keeffe, Gerard O’Hanlon, Deborah Tobin, Svenja Adolphs, Leigh Clark, Benjamin Cowan, Tania Fahey Palma, Fiona Farr, Sandrine Peraldi

This half-day workshop will provide attendees with insights into virtual workplace discourse and how examples of this may be exploited for use in the language classroom. At the core of this workshop will be data gathered by the Interactional Variation Online (IVO) project, which aims to examine virtual workplace communication to gain depth of insight into the potential barriers to effective communication.

The workshop will be contextualised with survey data regarding working from home generally and working online from the perspective of language teachers. This will instigate discussion relating to strategies and techniques for successful classroom interaction online. This will be followed by the presentation of a number of authentic examples of a variety of virtual meetings, showcasing examples of specific linguistic strategies employed in this context. One of these examples will subsequently be used to demonstrate how multimodal corpus software (ELAN) can facilitate the analysis of this data. The final phase of this workshop will present how this data may be approached for the benefit of language teaching and learning with examples of classroom tasks targeted at improving students’ communicative competence and intercultural communication in the virtual environment.

To facilitate this, the workshop will be composed of four stages:

1)      Perspectives on the virtual workplace

2)      Samples of virtual workplace exchanges

3)      Demonstration of how to construct and analyse a multimodal corpus of virtual workplace discourse

4)      Discussion and examples of classroom applications


Workshop 3) Teaching with CLARIN (Half Day Workshop - AFTERNOON)

Francesca Frontini (CLARIN ERIC), Darja Fiser (Faculty of Arts, University of Ljubljana) Michaela Mahlberg (University of Birmingham) Kristina Pahor de Maiti (CY Cergy Paris University) Iuliana van der Lek (CLARIN ERIC), Martin Wynne (University of Oxford)

The workshop (4 hours approximately) will introduce participants to the CLARIN European Research Infrastructure Consortium, which offers opportunities for teachers to access and use digital language resources and teaching materials, and also support for making teaching resources more sustainable.

Tutorial, demo and hands-on sessions will give participants the opportunity to see, evaluate and try out a number of different resources.The planning, preparation and execution of the workshop will play an important role in developing, testing, refining and improving reusable training materials. Selections will be made from among the existing materials identified in the UPSKILLS project, and trainers who will present in the workshop will be encouraged and supported to develop their materials, and take up CLARIN Training funding opportunities, if relevant. The tutorials developed for this workshop will/can be included in the UPSKILLS’ Best practice-guidelines for integrating research infrastructures into teaching and in the learning content, and further disseminated via the Teaching with CLARIN platform and the Social Sciences and Humanities Training Discovery Toolkit. 

Provisional agenda

Presentation: A Short Introduction to CLARIN (Francesca Frontini)An introduction to the CLARIN European Research Infrastructure Consortium, with a focus on its existing and planning activities and services relating to teaching and learning with digital language resources.

Tutorial 1: UPSKILLS: Integration of Research Infrastructures into Teaching (Iuliana van der Lek and Martin Wynne)The UPSKILLS project is an Erasmus+ strategic partnership for higher education that seeks to identify and tackle the gaps and mismatches in skills for linguistics and language students through the development of a new curriculum component and supporting materials to be embedded in existing programmes of study. The role of CLARIN in UPSKILLS is to provide guidelines and learning content to support teachers and trainers to integrate the language resources, tools and services distributed via the research infrastructure into teaching. The goal of this tutorial is to show how teachers can use the CLARIN infrastructure to collect an annotated corpus from scratch using the following web services:

The Virtual Language Observatory (VLO). 

The VLO contains references to more than 700,000 resources, the majority of which are hosted at CLARIN centres, but it also contains references to relevant resource collections maintained by other organisations. Metadata from repositories in the 25 CLARIN member and observer countries are harvested on a weekly basis, covering many languages, both national and regional. Each centre may use different but interoperable metadata profiles. Through advanced medata searches, the VLO enables fast identification of relevant resources, allowing researchers, lecturers and students to reuse resources that already exist, rather than having to produce their own from scratch.

The Language Resource Switchboard

Due to the abundance of online resources, researchers and students may have difficulty identifying suitable data, tools or software that match their specific research requirements. The general need for guidance and recommendation is addressed by catalogues, such as the EOSC Marketplace or the SSH Open Marketplace. Successful searching in catalogues relies, to a large extent, on the ability of the users to abstract away from their specific information needs and to understand the relevance of resources with a wider applicability. It is more convenient for users to start the search for suitable tools with what they easily have at hand: representative snippets or fragments of content (text, audio, video). This functionality is offered by the Language Resource Switchboard.


WebLicht (“Web-based Linguistic Chaining Tool”) is a user-friendly web service for automatic annotation of text corpora and it is hosted by the CLARIN centre at the University of Tübingen. NLP tools (e.g. sentence splitting, tokenization, lemmatization, POS tagging, morphological analysis, named entity recognition, dependency parsing, constituency parsing)  are made interoperable and encapsulated as web services, which can be combined by the user into custom processing chains. The resulting chain can then be visualised. WebLicht is tightly integrated into the CLARIN infrastructure. It uses information from the Centre Registry to harvest tool metadata from all CLARIN centre repositories. The tool metadata from the Centre Registry are automatically harvested several times each day, ensuring that all tool information is up to date. WebLicht also supports log in with CLARIN Federated Identity, which allows users to log in through their academic institutions.

Tutorial 2: ParlaMint – parliamentary discourse in the classroom (Darja Fiser and Kristina Pahor de Maiti)In this demo session we will present the award-winning tutorial Voices of the Parliament which teaches the key corpus linguistics techniques on the research problem of women's representation in the parliament and the ParlaMint family of 17 comparable corpora of parliamentary proceedings which contain rich metadata and linguistic annotations to answer sociolinguistic research questions. The practical examples will be performed on the ParlaMint-GB corpus which contains proceedings of the UK parliament from 2015 to March 2021. We will explore the data with the help of the NoSketch Engine concordancer which enables free and easy access to analyse all of the ParlaMint corpora.The showcase will combine quantitative and qualitative analysis to explore the role and representation of women in the parliament and the impact of the pandemic on their activity. In the demo, we will use frequency data, keywords and collocation candidates together with metadata on MP's gender and time of the sittings, and linguistic annotations of syntactic relations to investigate the production of MPs, the prevalent topics addressed by female and male MPs, and the characterisation of women and women-related issues. The final part of the session will be dedicated to valuable lessons learned during the production of the tutorial, and an open discussion about the potentials and obstacles for reuse and adaptations of the tutorial and the ParlaMint corpora for specific classroom settings (e.g. other languages, research topics).

Tutorial 3: CLiC (Corpus Linguistics in Context) - between close and distant reading (Michaela Mahlberg and Michele McIntosh)  

The web application CLiC is a tool for reading and analysing narrative fiction. Unlike general concordancers, CLiC has been optimised to run searches across full texts as well as within particular sections of texts, i.e. across direct speech, narration and narratorial interruptions of character speech (‘suspensions’). Therefore, CLiC is particularly suited to address research questions and educational applications around properties of narrative fiction, e.g. on characterisation. The CLiC corpora mainly contain texts from the nineteenth century, including a corpus of Dickens’s novels, a corpus of children’s fiction, and a collection of African American Writers. CLiC currently offers access to over 150 books and 16 million words. The CLiC interface has been designed to be user-friendly, aimed at both research and educational applications. It has a mobile-friendly version, too, for ‘concordancing on the go’. CLiC has been successfully used in secondary school contexts for the teaching of language and literature, and it has direct applications in second language teaching through fiction. The tutorial will give users hands-on experience of basic functionalities, as well as the KWICGrouper and the annotation tool that supports the analysis of concordance lines through user-defined categories. CLiC comes with a downloadable Activity Book (Mahlberg et al., 2017) and the CLiC blog illustrates example applications through a range of guest posts written by researchers and educators. Participants of the workshop will have the opportunity to submit a guest post.