Expanding the JHU Bible Corpus for Machine Translation of the Indigenous Languages of North America
DOI:
https://doi.org/10.33011/computel.v1i.949Abstract
We present an extension to the JHU Bible corpus, collecting and normalizing more than thirty Bible translations in thirty Indigenous languages of North America. These exhibit a wide variety of interesting syntactic and morphological phenomena that are understudied in the computational community. Neural translation experiments demonstrate significant gains obtained through cross-lingual, many-to-many translation, with improvements of up to 8.4 BLEU over monolingual models for extremely low-resource languages.
Downloads
Published
2021-03-02