Computational strategies for reducing annotation effort in language documentation: A case study in creating interlinear texts for Uspanteko

Alexis Palmer; Taesun Moon; Jason Baldridge; Katrin Erk; Eric Campbell; Telma Can

doi:10.33011/lilt.v3i.1217

Computational strategies for reducing annotation effort in language documentation

A case study in creating interlinear texts for Uspanteko

Authors

Alexis Palmer The University of Texas at Austin
Taesun Moon The University of Texas at Austin
Jason Baldridge The University of Texas at Austin
Katrin Erk The University of Texas at Austin
Eric Campbell The University of Texas at Austin
Telma Can The University of Texas at Austin

DOI:

https://doi.org/10.33011/lilt.v3i.1217

Keywords:

language description, Uspanteko

Abstract

With the urgent need to document the world's dying languages, it is important to explore ways to speed up language documentation efforts. One promising avenue is to use techniques from computational linguistics to automate some of the process. Here we consider unsupervised morphological segmentation and active learning for creating interlinear glossed text (IGT) for the Mayan language Uspanteko. The practical goal is to produce a totally annotated corpus that is as accurate as possible given limited time for manual annotation. We discuss results from several experiments that suggest there is indeed much promise in these methods but also show that further development is necessary to make them robustly useful for a wide range of conditions and tasks. We also provide a detailed discussion of how two documentary linguists perceived machine support in IGT production and how their annotation performance varied with different levels of machine support.

Downloads

Published

2010-02-01

How to Cite

Palmer, A., Moon, T., Baldridge, J., Erk, K., Campbell, E., & Can, T. (2010). Computational strategies for reducing annotation effort in language documentation: A case study in creating interlinear texts for Uspanteko. Linguistic Issues in Language Technology, 3. https://doi.org/10.33011/lilt.v3i.1217

Download Citation

Issue

Vol. 3 (2010): Implementation of Linguistic Analyses Against Data

Section

Articles

License

This work is licensed under CC BY 4.0, which permits you to use, share, adapt, distribute, and reproduce it in any medium or format, provided you credit the original author(s) and source.