Computational strategies for reducing annotation effort in language documentation
A case study in creating interlinear texts for Uspanteko
DOI:
https://doi.org/10.33011/lilt.v3i.1217Keywords:
language description, UspantekoAbstract
With the urgent need to document the world's dying languages, it is important to explore ways to speed up language documentation efforts. One promising avenue is to use techniques from computational linguistics to automate some of the process. Here we consider unsupervised morphological segmentation and active learning for creating interlinear glossed text (IGT) for the Mayan language Uspanteko. The practical goal is to produce a totally annotated corpus that is as accurate as possible given limited time for manual annotation. We discuss results from several experiments that suggest there is indeed much promise in these methods but also show that further development is necessary to make them robustly useful for a wide range of conditions and tasks. We also provide a detailed discussion of how two documentary linguists perceived machine support in IGT production and how their annotation performance varied with different levels of machine support.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under CC BY 4.0, which permits you to use, share, adapt, distribute, and reproduce it in any medium or format, provided you credit the original author(s) and source.