Morphology-Based Language Modeling for Arabic Speech Recognition


Vergyri, D., Kirchhoff, K., Duh, K., & Stolcke, A. (2004). Morphology-based language modeling for Arabic speech recognition. SRI International Menlo Park United States.


Language modeling is a difficult problem for languages with rich morphology. In this paper we investigate the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic. Class-based and single-stream factored language models using morphological word representations are applied within an N-best list rescoring framework. In addition, we explore the use of factored language models in first-pass recognition, which is facilitated by two novel procedures: the data-driven optimization of a multi-stream language model structure, and the conversion of a factored language model to a standard word-based model. We evaluate these techniques on a large-vocabulary recognition task and demonstrate that they lead to perplexity and word error rate reductions.

Read more from SRI