Translating Fieldwork into Datasets

The Development of a Corpus for the Quantitative Investigation of Grammatical Phenomena in Eibela

Authors

  • Grant Aiton Centre of Excellence for the Dynamics of Language, Australian National University

DOI:

https://doi.org/10.33011/computel.v2i.973

Abstract

This extended abstract details the process of constructing an annotated XML corpus suitable for quantitative analysis of morphosyntactic and phonetic phenomena in the Eibela language of Papua New Guinea. Preliminary results will also be included, which investigate the semantic, phonetic, and discourse correlates of argument realization. The goal of this paper is to illustrate how legacy materials can be enriched and investigated using computational methodologies including forced alignment of phonetic segments using bulk processing of data in Python and R, the Montreal Forced Aligner (MFA), and morphosyntactic annotation developed as part of the Multilingual Corpus of Annotated Spoken Texts (Multi-CAST).

Downloads

Published

2021-03-02