
Titre | A word-and-paradigm workflow for fieldwork annotation |
Publication Type | Article dans des actes |
Année de la conférence | 2022 |
Authors | Copot, Maria, Sara Court, Noah Diewald, Stephanie Antetomaso, and Micha Elsner |
Nom de la conférence | ComputEL |
Volume | Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages |
Publisher | Association for Computational Linguistics |
Conference Location | Dublin, Ireland |
Abstract | There are many challenges in morphological fieldwork annotation: it heavily relies on segmentation and feature labeling (which have both practical and theoretical drawbacks), it’s time-intensive, and the annotator needs to be linguistically trained and may still annotate things inconsistently. We propose a workflow that relies on unsupervised and active learning grounded in Word-and-Paradigm morphology (WP). Machine learning has the potential to greatly accelerate the annotation process and allow a human annotator to focus on problematic cases, while the WP approach makes for an annotation system that is word-based and relational, removing the need to make decisions about feature labeling and segmentation early in the process and allowing speakers of the language of interest to participate more actively, since linguistic training is not necessary. We present a proof-of-concept for the first step of the workflow: in a realistic fieldwork setting, annotators can process hundreds of forms per hour. |
Laboratoire de Linguistique Formelle – UMR 7110 CNRS et Université Paris Cité – RNSR : 200112497J
Adresse géographique : Bât. Olympe de Gouges, 5ème étage. 8, Rue Albert Einstein 75013 Paris
Envoyer un courrier : Case Postale 7031 – 5, rue Thomas Mann – F-75205 Paris Cedex 13
Transports : Métro ligne 14 : arrêt "Bibliothèque François Mitterrand" – Tram T3A : arrêt "Avenue de France" – Bus n°89 et 62 : arrêt "Porte de France"
Téléphone : (+33) (0)1 57 27 57 64 – Télécopie : (+33) (0)1 57 27 57 81
Directeur de la publication : Heather Burnett – Dernière mise à jour : 2025-04-29