Robust speech recognition using articulatory gestures in a dynamic bayesian network framework

Citation

V. Mitra, H. Nam and C. Espy-Wilson, “Robust speech recognition using articulatory gestures in a dynamic bayesian network framework,” in Proc. 2011 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), pp. 131–136.

Abstract

Articulatory Phonology models speech as spatiotemporal constellation of constricting events (e.g. raising tongue tip, narrowing lips etc.), known as articulatory gestures. These gestures are associated with distinct organs (lips, tongue tip, tongue body, velum and glottis) along the vocal tract.  In this paper we present a Dynamic Bayesian Network based speech recognition architecture that models the articulatory gestures as hidden variables and uses them for speech recognition.  Using the proposed architecture we performed: (a) word recognition experiments on the noisy data of Aurora-2 and (b) phone recognition experiments on the University of Wisconsin X-ray microbeam database. Our results indicate that the use of gestural information helps to improve the performance of the recognition system compared to the system using acoustic information only.


Read more from SRI

  • surgeons around a surgical robot

    The SRI research behind today’s surgical robotics

    Intuitive’s da Vinci 5 system represents a major leap in robotic-assisted medicine. It all started at SRI, which continues to advance teleoperation technologies.

  • a collage of digital graphs

    A banner year for quantum

    SRI-managed QED-C’s annual report on quantum trends captures an industry accelerating rapidly from technical promise toward major global impact.

  • ICE Cube containing SRI’s aerogel experiment, photographed prior to launch. Source: Aerospace Applications North America

    An SRI carbon capture experiment launches into space

    By synthesizing carbon-absorbing aerogels in microgravity, SRI research will give us a rare glimpse into how these materials could be radically improved.