Expanding the JHU Bible Corpus for Machine Translation of the Indigenous Languages of North America

Authors

  • Garrett Nicolai University of British Columbia
  • Edith Coates University of British Columbia
  • Ming Zhang University of British Columbia
  • Miikka Silfverberg University of British Columbia

DOI:

https://doi.org/10.33011/computel.v1i.949

Abstract

We present an extension to the JHU Bible corpus, collecting and normalizing more than thirty Bible translations in thirty Indigenous languages of North America. These exhibit a wide variety of interesting syntactic and morphological phenomena that are understudied in the computational community. Neural translation experiments demonstrate significant gains obtained through cross-lingual, many-to-many translation, with improvements of up to 8.4 BLEU over monolingual models for extremely low-resource languages.

Downloads

Published

2021-03-02