Andrew Janco - Haverford College. Cadet: A Tool to Add New Language Models to spaCy
This demo session will feature a simple web application designed to add new languages to spaCy (a fast and extensible open-source natural language processing library for Python). Using existing TEI annotations and text, Cadet facilitates the creation of the linguistic data needed to train a new language model. Using automated suggestions and active learning, we're working to allow a small team to produce sufficient data in a reasonable amount of time. Our team views the means to create NLP tools for new languages as an important contribution to linguistic diversity in the digital humanities and an opportunity for DH scholars to contribute to linguistic data and NLP research. This demonstration will share our progress and future directions of the project.
Ethem Mandić and Milan Marković - Faculty of Montenegrin Language and Literature (FCJK). Digitization of Montenegrin language skills in the global multicultural society
crnogorskijezik.me is created as a result of cooperation between the Directorate for Diaspora of Montenegro and the Faculty of Montenegrin Language and Literature from Cetinje as a modern learning tool of the standard Montenegrin language. The idea of creating such an internet space was primarily based on previous research and teaching practice of its implementers and direct communication with the Montenegrin diaspora, which required reliable access to information about the Montenegrin language, its history and tradition and standard models of its use.
Therefore, the course editors were trying to offer material that could provide the reliable information and proven methodological guidance in the process of acquiring language skills. Drawing on their previous experience in teaching Montenegrin language to foreigners with their own experience of learning foreign languages, the editors have tried to relate and describe in a methodologically valid, transparent and contemporary manner the basics of the standard Montenegrin language. In this respect, and given the fact that similar learning platforms are available today in almost every language in the world, this course is not a novelty. However, what can be noted as a step forward is the fact that the Montenegrin online course sets out the language material with its meanings in four other languages: English, Spanish, Turkish and Albanian, bearing in mind that Montenegrin communities are most numerous in these language areas.
Designed as a basic course for introduction to standard Montenegrin language for foreigners or non-native speakers, this online course offers to its users the opportunity to gradually learn general communication patterns, schemes and insight into the immanent structure of the Montenegrin language. Due to its diffuse nature and variants, as well as already existing textbooks and online courses of other standard languages of the Stokavian system, the material presented in the course is primarily based in Grammar and Orthography of the Montenegrin language as the first official sources and foundations of the standard Montenegrin language. Therefore, this course was created as a result of the absence of similar teaching tools on standard Montenegrin language for foreigners on the internet space, which makes it the first module of learning Montenegrin language for foreigners of this nature. Also, taking into account that textbooks for teaching Montenegrin language for foreigners are a recent phenomenon in Montenegro, the course covers material that starts from elementary information that offers each beginner a gradual introduction to the language itself and its orthoepic and orthographic features. For that reason the course begins with the introduction of official scripts of Montenegrin language and the pronunciation of phonemes and basic words, and with each new lesson it is intended to upgrade the knowledge to the users, with the intention to offer them the communication skills of independent use of the language. All this enables the use of this course as a textbook that is in line with official Montenegrin language learning programs for foreigners to all future Montenegrin language teachers and students. However, the central target group is precisely the Montenegrin emigration, which, by using this service, can, from a spatial and cultural distance, facilitate the preparation for a stay in Montenegro in advance.
The course consists of ten multimedia and interactive lessons that are segmented into basic grammar and general communication categories. Through them, users can gain insight into the standard Montenegrin language, its structure and standards. The lessons provide gradual learning of the pronunciation, grammar and spelling of the Montenegrin language, along with appropriate supporting texts on exploring Montenegro: its cities, culture, traditions, geography and other relevant information that would enable future visitors to enjoy a pleasant stay, orientation and communication in Montenegro. The supporting texts are of the encyclopedic type and contain general information on the geographical location, demography, climate, history and culture of the represented locality, which can serve as a useful tourist material and guide for exploring the highlighted parts of Montenegro. Therefore, their primary function is informative and for that reason, all the supporting texts are complementary to the lessons. In addition, through this platform, users can also get informations about the state characteristics of Montenegro and important contact information on essential national services and institutions.
The grammar covered in course is mostly presented by means of structuralist methodology and minimised to basic indicators enabling users to freely connect things they had learned and to approach creatively and independently to the use of language. In this sense, the guidance for the selection and presentation of grammatical material were set by the editors of the course who tried to embrace the total lexical and grammatical material necessary for understanding and mastering the structure of the language. Therefore, one of the methodological aims of the course is to provide insight into the general lexical and syntactic features of the Montenegrin language, so that, other than basic grammatical categories, lessons deal with the verb tenses (present, perfect, future 1), all grammatical cases, possessive pronouns, comparison of adjectives, etc. The added value of the grammatical material represented in the course is reflected in the separate vocabulary lists of adjectives with their forms of comparison, and verbs with possible forms in different verb tenses. Those lists give an insight into the most common and frequent words in everyday speech and, as we have stated, are classified according to the traditional principles of dividing words into their types.
All lessons also include audio examples of pronunciation of words and their meaning in English, Spanish, Turkish and Albanian. There is a short revision of covered grammar at the end of the lessons, whereas at the end of the course itself, the revision of the whole grammar was designed so that the users could test the acquired knowledge. In addition, each lesson contains a specially designed dialogue with a list of unfamiliar words that continuously serves as an introduction to the grammar aspects covered in the lessons.
Since this is the basic course of Montenegrin language, meaning that its function is to primarily introduce Montenegrin language and its structure, it is not certified nor designed by adopted curriculum for learning the Montenegrin language for foreigners. In this sense, users are not offered the opportunity to take exams and certificates, but to have free and unobstructed access to language content that should primarily provide them with usable value. Likewise, the general and group nature of the course is first and foremost aligned with the requirements of the users, with care being taken to ensure that the material is adapted to all ages. Nevertheless, the material in it encompasses almost complete A1, A2 and B1 levels of Common European Framework of Reference for Languages which creates a perspective for the course to expand with other modules such as advanced Montenegrin Language, business Montenegrin language etc. For these reasons, the Montenegrin online course is designed as a free service platform, whereby each user can continuously monitor their personal dynamics of learning and mastering language by simply registering their free account.
As we all know, due to their convenience, online courses are nowadays one of the most attractive learning models, with almost every relevant academic and educational institution in the world offering them. This online course has been created with support for all information devices and platforms. In this way, users can easily access online material either from their computers or portable and mobile devices.
Caoimhín Ó Dónaill - Ulster University. CLILSTORE: An open online platform for multimedia language learning
This demo will showcase CLILSTORE an online educational platform currently being developed by the EU funded CLIL Open Online Learning Project (www.languages.dk). CLILSTORE is both an authoring and sharing platform that enables educators to create multimedia learning units that can combine audio, video, text, images and Web 3.0 applications. Learner autonomy is underpinned by the way the software treats embedded texts, e.g. verbatim typescripts of audio or audiovisual recordings are automatically linked word for word to a nexus of online dictionaries that facilitates deep transcultural learning by enabling content users to gradually immerse themselves into the unknown or unfamiliar.
The platform also promotes the curation of created or selected digital artefacts that become the focal point of a learning unit. Unit metadata steers learners by indicating: a Common European Framework of Reference (CEFR)-based learner level, recording length, typescript wordcount, a descriptive summary of the content, and a technical description of the language (e.g. speaker speed, speaker dialect and type of vocabulary/terminology to be encountered). Learners also have the option of registering with the platform in order to avail of additional functionality e.g. a vocabulary recording tool that automatically builds a vocabulary list based on the words they have clicked on for bilingual dictionary consultations as they read through the typescripts.
User analytics also allow learning unit creators to monitor the level of exploitation of a given unit, including the number of visits and the number of times words in the typescript have been clicked on.
CLILSTORE is a genuinely disruptive resource; it challenges educators who deal with intercultural learning to think about creating bespoke materials for their learners while simultaneously providing them with the means and international exemplars of how to achieve that aim.
Jessica Green - British Library. Around the British Library in 40 Languages: Engaging with a Different Community Each Week #AToUnknown
Throughout 2019, the Heritage Made Digital team at the British Library led a Twitter campaign to promote awareness and engagements of BL digitised collections through the lens of world languages. This coincided with the International Year of Indigenous Languages #IYIL19 and the BL's landmark Writing exhibition. Each week they focussed on a different language represented in collections digitised and made available online by the British Library, including those physically held at other organisations in the case of the Endangered Archives Programme. They chose specifically those language written (at least sometimes) in non-Latin scripts in order to promote the lesser-known language collections and to use those scripts to create weekly sudoku. In addition to their sudoku, which garnered engagement and a sense of playfulness, they shared relevant collection guides, blogs, curator videos, quizzes, direct links to digitised collections available online, and search tips to find even more. In addition, they directed their followers to organisations and digital resources around the world making collections in or about these languages available online.
One of the major focusses of this campaign was to connect people around the world to digitised cultural heritage collections by and about their communities. The platform they used was Twitter, which provided a range of opportunities and challenges. Due to language ability of staff and a desire to reach as wide an audience as possible, they mainly tweeted in English, but were aware that was not always the best method for reaching target audiences. When possible they added hashtags or phrases in the featured language, or they retweeted translations of their tweets done by generous Twitter followers. They also tried to tweet in secondary languages used by the target communities – such as translating their tweets into French for Inuktitut week. They also focussed on more meaningful engagement than retweets and likes, trying whenever possible to ask questions that would garner conversations between and amongst themselves and their followers.
Now that the year-long Twitter campaign has ended, they are looking back to better understand and learn from their experiences through a range of analytics. This includes analysing the raw Twitter analytics and weighting comments above retweets and likes. They also screenshotted examples of engagement through quote retweets, mentions, and comments for all 40 featured languages. They are also sharing this analytics data with stakeholders around the BL including the International Team who are interested in how our digitised collections are being used by individuals in their countries of origin.
Stuart Prior and Lucie-Aimée Kaffee- Wikimedia UK. Wikidata and Languages: Building a multilingual internet
Wikidata is Wikimedia's most important sister project to Wikipedia. Its ability to disambiguate and its multilingual approach is key to the creation of searchable, linked-databases that cross multiple languages and enable people to search in their own language.
We've moved from just things to words in recent years, matching concepts and the language for those concepts in machine-readable form.
But, what are the potential impacts of this database? Are we doing this in a way that works beyond European languages? Who *owns* this language data and how can academics and linguists engage with this, improve and make us of it?
We propose a session to present and discuss the current state of things with Wikidata, and how universities can engage with its new format of Lexemes.