Expanding the JHU Bible Corpus for Machine Translation of the Indigenous Languages of North America

  • Garrett Nicolai University of British Columbia
  • Edith Coates University of British Columbia
  • Ming Zhang University of British Columbia
  • Miikka Silfverberg University of British Columbia

Abstract

We present an extension to the JHU Bible corpus, collecting and normalizing more than thirty Bible translations in thirty Indigenous languages of North America. These exhibit a wide variety of interesting syntactic and morphological phenomena that are understudied in the computational community. Neural translation experiments demonstrate significant gains obtained through cross-lingual, many-to-many translation, with improvements of up to 8.4 BLEU over monolingual models for extremely low-resource languages.

Published
2021-03-02