Cross-dialectal Acoustic Data Sharing for Arabic Speech Recognition

Citation

Kirchhoff, K., & Vergyri, D. (2004, May). Cross-dialectal acoustic data sharing for Arabic speech recognition. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. I-765). IEEE.

Abstract

The automatic recognition of Arabic dialectal speech is a challenging task since Arabic dialects are essentially spoken varieties, for which only sparse resources (transcriptions and standardized acoustic data) are available to date. In this paper we describe the use of acoustic data from Modern Standard Arabic (MSA) to improve the recognition of Egyptian Conversational Arabic (ECA). The cross-dialectal use of data is complicated by the fact that MSA is written without short vowels and other diacritics and thus has incomplete phonetic information. This problem is addressed by automatically vowelizing MSA data before combining it with ECA data. We described the vowelization procedure as well as speech recognition experiments and show that our technique yields improvements over our baseline system.


Read more from SRI

  • surgeons around a surgical robot

    The SRI research behind today’s surgical robotics

    Intuitive’s da Vinci 5 system represents a major leap in robotic-assisted medicine. It all started at SRI, which continues to advance teleoperation technologies.

  • a collage of digital graphs

    A banner year for quantum

    SRI-managed QED-C’s annual report on quantum trends captures an industry accelerating rapidly from technical promise toward major global impact.

  • ICE Cube containing SRI’s aerogel experiment, photographed prior to launch. Source: Aerospace Applications North America

    An SRI carbon capture experiment launches into space

    By synthesizing carbon-absorbing aerogels in microgravity, SRI research will give us a rare glimpse into how these materials could be radically improved.