Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition

Citation

Vergyri, D., & Kirchhoff, K. (2004). Automatic diacritization of Arabic for acoustic modeling in speech recognition. In Proceedings of the workshop on computational approaches to Arabic script-based languages (pp. 66-73).

Abstract

Automatic recognition of Arabic dialectal speech is a challenging task because Arabic dialects are essentially spoken varieties. Only few dialectal resources are available to date, and most available acoustic data collections are transcribed without diacritics. Such a transcription omits essential pronunciation information about a word, such as short vowels. In this paper we investigate different procedures that enable us to use such training data by automatically inserting the missing diacritics into the transcription. These procedures use acoustic information in combination with different levels of morphological and contextual constrains. We evaluate their performance against manually diacritized transcriptions. In addition, we demonstrate the effect of their accuracy on the recognition performance of acoustic models trained on automatically diacritized training data.


Read more from SRI

  • surgeons around a surgical robot

    The SRI research behind today’s surgical robotics

    Intuitive’s da Vinci 5 system represents a major leap in robotic-assisted medicine. It all started at SRI, which continues to advance teleoperation technologies.

  • a collage of digital graphs

    A banner year for quantum

    SRI-managed QED-C’s annual report on quantum trends captures an industry accelerating rapidly from technical promise toward major global impact.

  • ICE Cube containing SRI’s aerogel experiment, photographed prior to launch. Source: Aerospace Applications North America

    An SRI carbon capture experiment launches into space

    By synthesizing carbon-absorbing aerogels in microgravity, SRI research will give us a rare glimpse into how these materials could be radically improved.