Volume 17 Language Research in the 21st Century
Graduate Working Papers

Language Identification and Language Specific Letter-to-Sound Rules

Stephen Lewis
University of Colorado Boulder
Katie McGrath
University of Colorado Boulder
Jeffrey Reuppel
Stanford University

How to Cite

Lewis, S., McGrath, K., & Reuppel, J. (2004). Language Identification and Language Specific Letter-to-Sound Rules. Colorado Research in Linguistics, 17. https://doi.org/10.25810/60mf-fn94

Abstract

This paper describes a system that improves automatic ARPABET transcription by addressing performance issues resulting from Arabic and Russian transliteration in English text. Our system is called EAR (English, Arabic, Russian). The EAR system has two components: 1. An n-gram language identifier module which classifies an incoming unknown word as Arabic, Russian, or English, 2. Language specific letter to sound rules which output a pronunciation for a word based on its classification. Our results show overall system error reduction rates at upwards of 45% as compared to a system trained only on English.