Building a Common Voice Corpus for Laiholh (Hakha Chin)
In this paper, we discuss our efforts to build a corpus for Laiholh, also called Hakha Chin. Laiholh is spoken in Chin State in Western Myanmar, in parts of India and Bangladesh, and in several Burmese refugee communities in the US. Indiana, for example, is home to about 25,000 Burmese refugees. The ultimate goal of our team is to contribute to the development of speech translation technology that will be of benefit, both in general and in the local community in Indianapolis. Translation tools would be of great use in local emergency rooms, schools, and businesses. In pursuing our (admittedly lofty) goals, we are building a growing community of speakers, field linguists, computational linguists, and computer scientists. As a team, we have worked to share our different skill sets and mobilize the wider community around collecting data via Mozilla’s Common Voice platform. We present here a reflection on the project thus far, the kind of description we wish had existed when we were first building this collaboration and determining preliminary project goals. We hope that other communities and language activists who are thinking about developing speech technology may benefit from hearing about our motivations, concerns, experiences, and successes.