Summary of activities of the theme group 4 “Artificial intelligence, machine learning and NLP in language worlds”
Facilitators
- Kalika Bali, Microsoft Research India
- Quinn Dombrowski, Stanford University
Members
- Alan Black, CMU, Pittsburgh
- Andiswa Bukula, South African Centre for Digital Language Resources (SADiLaR)
- Monojit Choudhury, Microsoft Research Labs India
- Marelie Davel, MuST (Multilingual Speech Technologies) NorthWestern University, South Africa
- Andrew Janco, Haverford College
- Hoyt Long, University of Chicago
- Raoul Nanavati, Navana Tech
- Cecilia Piaggio, RWS Moravia, Argentina
- Gabriele Salciute Civiliene, King's College London
- David Wrisley, NYU Abu Dhabi
This workshop consisted of a series of daily prompts for participants to respond to over the course of a week. These prompts were designed to draw out shared challenges when working with and/or developing NLP tools for languages other than English, as well as those issues unique to particular kinds of languages (e.g. due to morphology, writing system, availability of experts who can do annotation, etc.) The group discussed the importance of understanding the sociocultural context of language use when developing NLP, and the challenges of industry/academic collaboration (or even NLP/humanities collaboration within academia) due to different incentive structures. The next step from this workshop is to write a white paper that draws together the contributions of all the participants in enumerating key issues facing multilingual NLP, the stakes of this technology for humanities research, and opportunities for both scholars and industry professionals to collaborate and engage with these challenges moving forward.