Improving Language Identification Robustness to Highly Channel-Degraded Speech through Multiple System Fusion

, ,

Citation

Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., & Graciarena, M. (2013, August). Improving language identification robustness to highly channel-degraded speech through multiple system fusion. In INTERSPEECH (pp. 1507-1510).

Abstract

We describe a language identification system developed for robustess to noise conditions such as those encountered under the DARPA RATS program, which is focused on multi-channel audio collected in high noise conditions. Work presented here includes novel approaches to scoring iVectors, the introduction of several new acoustic and prosodic features for language identification, and discriminative file selection approaches to score calibration.  Further, we explore the use of Discrete Cosine Transforms (DCT) as a supplement to traditional context modeling with Shifted Delta Cepstrum (SDC) and fusion of multiple iVector systems based on Gaussian backends, neural networks, and adaptive Gaussian backend modeling.

Index Terms: language identification, speech features, iVector scoring.


Read more from SRI

  • surgeons around a surgical robot

    The SRI research behind today’s surgical robotics

    Intuitive’s da Vinci 5 system represents a major leap in robotic-assisted medicine. It all started at SRI, which continues to advance teleoperation technologies.

  • a collage of digital graphs

    A banner year for quantum

    SRI-managed QED-C’s annual report on quantum trends captures an industry accelerating rapidly from technical promise toward major global impact.

  • ICE Cube containing SRI’s aerogel experiment, photographed prior to launch. Source: Aerospace Applications North America

    An SRI carbon capture experiment launches into space

    By synthesizing carbon-absorbing aerogels in microgravity, SRI research will give us a rare glimpse into how these materials could be radically improved.