Computational strategies for reducing annotation effort in language documentation

A case study in creating interlinear texts for Uspanteko

Authors

  • Alexis Palmer The University of Texas at Austin
  • Taesun Moon The University of Texas at Austin
  • Jason Baldridge The University of Texas at Austin
  • Katrin Erk The University of Texas at Austin
  • Eric Campbell The University of Texas at Austin
  • Telma Can The University of Texas at Austin

DOI:

https://doi.org/10.33011/lilt.v3i.1217

Keywords:

language description, Uspanteko

Abstract

With the urgent need to document the world's dying languages, it is important to explore ways to speed up language documentation efforts. One promising avenue is to use techniques from computational linguistics to automate some of the process. Here we consider unsupervised morphological segmentation and active learning for creating interlinear glossed text (IGT) for the Mayan language Uspanteko. The practical goal is to produce a totally annotated corpus that is as accurate as possible given limited time for manual annotation. We discuss results from several experiments that suggest there is indeed much promise in these methods but also show that further development is necessary to make them robustly useful for a wide range of conditions and tasks. We also provide a detailed discussion of how two documentary linguists perceived machine support in IGT production and how their annotation performance varied with different levels of machine support.

Downloads

Published

2010-02-01

How to Cite

Palmer, A., Moon, T., Baldridge, J., Erk, K., Campbell, E., & Can, T. (2010). Computational strategies for reducing annotation effort in language documentation: A case study in creating interlinear texts for Uspanteko. Linguistic Issues in Language Technology, 3. https://doi.org/10.33011/lilt.v3i.1217