our words make worlds

DDM Workshop: Theme Group 2 Summary

Home Aims Programme Participate Videos Reports Theme Groups Contact

Summary of activities of the theme group 2 “Working with multimodal multilingual methods and data” 


  • Miguel Escobar, National University of Singapore
  • Darja Fišer, University of Ljubljana


The invited speakers were:

  • Alvin Eng Hui Lim, National University of Singapore
  • Ciara R. Wigham, Laboratoire de Recherche sur le Langage, Université Clermont Auvergne
  • Costanza Navarretta, Centre for Language Technology, Department of Nordic Studies and Linguistics, University of Copenhagen
  • David Moeljadi, Palacký University Olomouc
  • Ernesto Priani Saisó, Universidad Nacional Autónoma de México, Facultad de Filosofía y Letras
  • Erzsébet Tóth-Czifra, DARIAH
  • František Kratochvíl, Palacký University Olomouc

Many research communities use multimodal corpora with various research goals. While we share the same research object, we explore it through a wide array of conceptual frameworks, methods and approaches, which is reflected already in how such corpora are compiled. Since compilation of multimodal corpora is a highly resource- and time-consuming endeavour, corpus developers are encouraged to also consider the concepts, objectives, and needs of their fellow researchers from other communities and disciplines and maximize the reuse of their corpus already at the stage of decision making during corpus compilation and annotation processes, avoid data and metadata loss, and ensure that the whole process is transparent and well-documented.

Format of Theme Group 2

In line with the overall context of the DDM workshop, the focus of this theme group was disrupting digital monolingualism in multimodal corpora. We aimed to bring together researchers and practitioners with interest and experience in the following areas:

  • Technical, legal and ethical issues in building non-English multimodal corpora
  • Quantitative, qualitative and mixed methods for researching multimodal corpora
  • Using multimodal corpora for pedagogical purposes, therapeutic/medical purposes (e.g. sign language, language impairments, brain damage, etc.)

The 2-hour virtual session was held on June 22 2020 and was facilitated by Miguel Escobar Varela (National University of Singapore) and Darja Fišer (CLARIN ERIC and University of Ljubljana), supported by the Disrupting Digital Monolingualism workshop represented by Kristen Schuster and Paul Spence.

The invited speakers were asked to structure their 5 min presentation in the following way:

  • Your experience
  • Problems you’ve identified
  • Solutions you’ve developed
  • Three strategies you think are key to research communities
  • Tools and readings to contribute to a shared resource document

After a round of individual presentations, a plenary discussion on the presented problems and solutions held which was moderated by Darja and Miguel in order to identify common obstacles and propose future steps for the community.

Summary of individual presentations

Alvin Eng Hui Lim is Deputy Director and Technology and Online Editor (Mandarin) and Translation Editor (Chinese) of the Asian Shakespeare Intercultural Archive (A|S|I|A, http://a-s-i-a-web.org/) and works on the intersections of theatre and religion with an emphasis on digital media. The A|S|I|A archive provides translations of video recordings of Shakespeare’s plays produced in East and Southeast Asia to its users. Data is also collected about the live performance event itself and is intended to supply contextual information from the ground level of theatre practice to the understanding of the videos and scripts. Both the video recordings and contextual data need to be accessible to all users, including native speakers, and therefore cannot be completely subsumed into a monolingual (English) environment but are translated into three or more languages. This is a tedious process and the multilingual interface has deep implications on the overall archiving approach and work processes. As a result, they have developed technical solutions as well as guidelines for multilingual archiving and are currently working on a blended e-learning module to complement the archive. Based on his experience, Alvin calls for a greater diversity as well as best practices on collaborative work between local and non-local collaborators in cross-cultural work to encourage intercultural exchange. He also believes that the community needs standards for structuring multilingual databases that allow interoperability between (performance) archives or databases in different languages as well as standards for video playback of intangible cultural heritage and live performance events that capture or provide their respective cultural and linguistic context.

Ciara R. Wigham works on Computer-Assisted Language Learning and examines the contribution of multimodality to pedagogical interactions that are mediated by technology. Her research focus is on multimodality in French and English as L2 within telecollaboration (where language learners interact and collaborate with other learners from different locations in the target language using online or digital tools), hybrid Content and Language Integrated Learning courses employing multimodal 3D environments (virtual worlds) and online language teacher training via synchronous tools. Ciara has identified four common obstacles in her line of work as well as recommended some solutions. First, data collection from a telecollaboration context involves participants from at least two different countries where legal and institutional policies applying to data consent procedures, data collection and data diffusion may differ. To help with this, she and her colleagues have developed a data management planning toolkit, specifically tailored to multimodal data. Second, in order to ensure that the corpus can also be used by researchers who were not involved in the learning design or data collection, emphasis on detailed data description is required when interaction data are collected and structured. To overcome this issue, she recommends using the language-independent LETEC methodology for structuring multimodal pedagogical data. Third, working with L2 interaction data does not necessarily allow for the automatic transcription of verbal contributions due to contributions not meeting the ‘target’ model and the use of code-switching, meaning corpus structuration is very time-consuming and requires collaboration of several researchers. And finally, since computer-mediated communication in pedagogical contexts can include genres such as chats, forums, text messaging, tweets, social network sites and multimodal 3D environments, there is a need for standards for the representation of those genres and their structural and linguistic peculiarities in order to foster interoperability between language resources. Such standards, e.g. the CMC-Core TEI schema, would help scholars build interoperable CMC corpora for different languages and enhance the empirical basis for doing CMC research across languages and cultures. Given the time required to structure multimodal corpora, Ciara advocates for recognition within professional evaluation practices of corpora as a key research output, and not simply a by-product, and to adhere to standards for structuring multimodal CMC corpora that allow interoperability between different language resources and to facilitate comparative studies between different pedagogical instantiations (which could be in different L2s). She also promotes Open Access corpus diffusion which allows, from an epistemological viewpoint, analysis of the corpus from different research perspectives, theoretical standpoints, and methodological approaches.

Costanza Navarretta’s area of research is multimodal communication in human-human and human-machine interactions from a natural language processing point of view. She has been involved in the creation, annotation and exploitation of a very broad range of monolingual and multilingual multimodal corpora. Apart from studies on how to annotate gestures in multimodal corpora, her research has comprised studies of specific communicative functions, such as feedback, turn management, and other phenomena. She has also been investigating automatic prediction of specific communicative functions, of emotions from the transcriptions, and of annotations in corpora. She is currently addressing automatic identification of head movements from videos, and multimodal language acquisition in bi-/trilingual children. Based on her experience, she points out three key unresolved issues. First and foremost, free access to multimodal corpora is typically not possible because they contain sensitive data or are restricted by copyright. Second, automatic identification and classification of gestures (as well as automatic transcription of speech for some languages) in unrestricted videos is still very much an unsolved problem: some gestures cannot be seen clearly; video quality changes; manual transcriptions and annotations are time consuming; there are many theories and ways to perform the annotations depending on aims, tasks, definition of gestures etc. And third, because video standards change frequently, some conversions to other standards can be problematic (simultaneity between audio-video, quality of pictures). She argues that the necessity of open use (and sharing) of some types of multimodal data, such as broadcast transmissions, should be part of the common European research agenda. She also advocates the use of a formal standard annotation framework, such as MUMIN (Allwood et al. 2007) for annotating co-speech gestures (form and function) and their relation to speech which she contributed to, accurate description of research data accurately, and ensuring that all collections of spontaneous data which is not sensitive can be opened up.

Ernesto Priani Saisó is a philosopher and digital humanist who has been involved with the Oceanic Exchanges project, an international partnership among 8 national libraries from Mexico, the US, and Europe for analysing historical newspapers in multiple languages. In the project, they have digitized and linked historical newspapers to study the spread of news across contexts. He remarked on the ease with which international digital humanities projects become monolingual projects, in spite of the availability of multilingual skills in the research team and/or the affordances of their corpus. There are many reasons why this happens: English is the lingua franca of academia, it is hard to work with languages that are less common, and institutions attach more weight to academic production in English. He then proposed a variety of strategic interventions for tackling these problems, especially when designing novel research agendas on the increasingly available multilingual multimodal research datasets:

  1. Subgroups of the research team could work on other languages.
  2. In situations when translation into English is absolutely necessary, it is important that materials in the original languages remain accessible throughout the project.
  3. When possible, use intermediary translations in other languages.
  4. Use websites to publish textual and audio-visual notes about the project in the original languages.
  5. Publish resulting papers in as many languages as the team can manage.

David Moeljadi is a computational linguist and lexicographer who has been working in Indonesian and Old Javanese. There are very few openly accessible large annotated corpora for Malay languages (including Indonesian, and variants of Malay used in Singapore, Malaysia and Brunei). Furthermore, there are very few good and reliable tools for pre-processing (normalization, tokenization, stemming, morphological analysis) available for these languages and extensive work must be done manually. However, the process needs to be automated in the future to handle larger datasets with less time and cost. The automation will involve machine learning, which itself requires annotated corpora based on which learning can take place. Therefore, automation requires a continuous cycle. An annotated corpus is created by annotating an existing corpus, and the resulting corpus – after necessary manual corrections – can be used to improve the annotation process, which will in turn be used for annotating another corpus. Linguists and natural language processing researchers need to work closely together, especially given that the number of researchers is much smaller in Malay/Indonesian compared to languages such as English, Mandarin, and Japanese. Most work in this area is limited to text, and extending current tools to multimedia corpora will require even more collaboration and openly available resources. Moeljadi proposed three strategies for the future:

  1. Addressing the need for open resources
  2. Enhancing cooperation between linguists and natural language processing researchers
  3. Additional support from government agencies and private companies

František Kratochvíl does research in Descriptive Linguistics, Pragmatics and Historical Linguistics. He described a variety of projects, including his own work on the documentation and analysis of the languages of Papua, as well as the large-scale Palacký University project Sinophone Borderlands – Interaction at the Edges which explores how the Sinophone world interacts with the Turco-Persophone, Slavophone, Tibetophone, Hispanophone and Austroasiatophone worlds. Kratochvíl emphasized the need for cultural-specificity in the process of documenting, analysing and communicating linguistic research. Multimedia documentation, for example, can include cultural performances. When dealing with this kind of data it is important to document multiple performances to track variations, and then recreate them in formats that are culturally-significant. He echoed the thoughts of other participants and identified the lack of training data as a key problem for automating the annotation process which would speed up the process and make it significantly cheaper, allowing researchers to collect much larger datasets on the one hand, and allow them more valuable time doing actual research rather than data annotation. He also highlighted the importance of creation and maintenance of living archives.

Erzsébet Tóth-Czifra spoke in her capacity as Open Science Officer at DARIAH-EU. She further emphasized the themes of open access and open science that were part of the discussions and stressed the role that supranational organizations can play in facilitating access.

Summary of joint discussion points

The discussion highlighted several areas of convergence and divergence.

Many concerns are shared, especially in relation to ethical and legal considerations. Open access appears to be a widely shared ideal, but one that is not always possible to implement.

In some instances concepts are shared, while in others the same terminology refers to slightly different concepts across fields of practice. One such concept is annotation, which is indeed a key concern of working with multimodal corpora. In linguistics and computational linguistics, annotations are meant to enable quantitative analysis of the data as well as comparisons across corpora and are standardized and machine-readable. In these cases, calculating error rates for inter-coder reliability and devising automatic systems for assigning annotations are feasible. In other disciplines, such as communities that deal with intangible cultural heritage, annotations are meant to introduce human readers to the cultural complexities of a corpus. This is felt more strongly in the communities that deal with intangible cultural heritage than in those interested in comparative historical research or research of first- and second-language acquisition, which is becoming increasingly data-driven and and would therefore benefit greatly from adopting methodological approaches and technological practices that have been successfully tested in corpus and computational linguistics.

Objectives and use cases vary. But a strong recommendation is to enable conversations across disciplinary boundaries. Indeed the disruption of “digital monolingualism” applies not only to distinct language communities, but to epistemic communities that coalesce around shared methods and assumptions. The invitation is to make these practices more transparent and to enable conversations and exchange across communities of academic practice. For example: enabling the reuse of corpora between different objectives, such as historical, SLA, and cultural heritage research.

Selected tools and readings

  • Bay-Cheng, Sarah. (2012). “Theater Is Media: Some Principles for a Digital
  • Historiography of Performance”. Theatre 42 (2): 27–41. https://doi.org/10.1215/01610775-1507775.
  • Featherstone, Mike. (2000). “Archiving Cultures”. British Journal of Sociology 51 (1):161–184.
  • Whatley, S., Cisneros, R. K., & Sabiescu, A. (Eds.). (2018). Digital Echoes: Spaces for Intangible and Performance-based Cultural Heritage. Cham: Springer.
  • Aranha, S. & Wigham, C.R. (2020, in print). Virtual exchanges as complex research environments: facing the data management challenge. A case study of Teletandem Brasil. Journal of Virtual Exchange. https://journal.unicollaboration.org/
  • Chanier, T. & Wigham, C.R. (2015). Archi21 corpus: collaborative language and architectural learning in Second Life. CoMeRe Corpus. Ortolang.fr : Nancy. https://hdl.handle.net/11403/comere/cmr-archi21
  • Chanier, T. & Wigham, C.R. (2016). A scientific methodology for researching CALL interaction data : Multimodal LEarning and TEaching Corpora, In Hamel, M-J. & Caws, C. (Eds.). Learner Computer Interactions : New insights on CALL theories and applications. Amsterdam : John Benjamins. ⟨edutice-01332625v1⟩.
  • Chanier, C., & Ciekanski, M. (2010). Utilité du partage des corpus pour l'analyse des interactions en ligne en situation d'apprentissage : un exemple d'approche méthodologique autour d'une base de corpus d'apprentissage. ALSIC - Apprentissage des Langues et Systèmes d'Information et de Communication, 13, ⟨10.4000/alsic.1666⟩.
  • Mulce Corpus Repository (2020). Laboratoire de Recherche sur le Langage : Clermont-Ferrand. http://lrl-diffusion.univ-bpclermont.fr/mulce2/accesCorpus/accesCorpusMulce.php
  • Reffay, C., Betbeder, M-L. & Chanier,T. (2012). Multimodal learning and teaching corpora exchange: Lessons learned in five years by the Mulce project. International Journal of Technology Enhanced Learning, Datasets and Data Supported Learning in Technology-Enhanced Learning, pp.1-20. ⟨edutice-00718392⟩
  • Wigham, C.R. & A. Bayle (2013). Enjeux, outils et méthodologie de constitution de corpus d’apprentissage, In Damiani M., Dolar K., Florez-Pulido C., Magnier J. & Loth R. (Dir.) Actes de Coldoc 2012. Paris : Modyco. pp. 36-52. ⟨edutice-00710698⟩.
  • Praat (doing phonetics by computer) and Parselmouth – Praat in Python, Forced alignment: e.g. http://linguistics.berkeley.edu/plab/guestwiki/index.php?title=Forced_alignment
  • Multimodal annotation: Elan https://archive.mpi.nl/tla/elan or Anvil ) https://www.anvil-software.org/,
  • Automatic identification of gestures OpenPose https://github.com/CMU-Perceptual-Computing-Lab/openpose, OpenCv.
  • Automatic identification of emotions from speech: emovoice https://github.com/hcmlab/emovoice
  • Jens Allwood, Loredana Cerrato, Kristiina Jokinen, Costanza Navarretta and Patrizia Paggio. The MUMIN coding scheme for the annotation of feedback in multimodal corpora: a prerequisite for behavior simulation. In Language Resources and Evaluation. Special Issue. J.-C. Martin, P. Paggio, P. Kuehnlein, R. Stiefelhagen, F. Pianesi (Eds.) Multimodal Corpora for Modeling Human Multimodal Behavior, Volume 41, Nr. 3-4:273-287, 2007, Springer, www.springerlink.com.
  • Costanza Navarretta, Elisabeth Ahlsén, Jens Allwood, Kristiina Jokinen, Patrizia Paggio. Feedback in Nordic First-Encounters: a Comparative Study. In Proceedings of LREC 2012, May 2012, Istanbul, Turkey, pp. 2494-2499.
  • Costanza Navarretta. Annotating and Analyzing Emotions in a Corpus of First Encounters. In Proceedings of the 3rd IEEE International Conference on Cognitive Infocommunications, Kosice, Slovakia, 2-5 December 2012, pp. 433-438.
  • Patrizia Paggio and Costanza Navarretta The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. In Language Resources and Evaluation. Vol.51, Issue 2, pp. 463-494, 2017.
  • Costanza Navarretta Prediction of audience response from spoken sequences, speech pauses and co-speech gestures in humorous discourse by Barack Obama. In 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Debrecen, 2017, pp. 327-332.
  • Costanza Navarretta and Lucretia Oemig. Big Data and Multimodal Communication: A Perspective View. Intelligent Systems Reference Library, Vol. 159:167-184, 2019, Springer.