Journal Article May 1, 2017

Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition

Studies have shown that articulatory information helps model speech variability and, consequently, improves speech recognition performance. But learning speaker-invariant articulatory models is challenging, as speaker-specific signatures in both the articulatory...

Conference Proceeding March 1, 2017

Speech recognition in unseen and noisy channel conditions

Speech recognition in varying background conditions is a challenging problem. Acoustic condition mismatch between training and evaluation data can significantly reduce recognition performance. For mismatched conditions, data-adaptation techniques are typically...

Conference Proceeding March 1, 2017

Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks

Articulatory information can effectively model variability in speech and can improve speech recognition performance under varying acoustic conditions. Learning speaker-independent articulatory models has always been challenging, as speaker-specific information in...

Journal Article February 1, 2017

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend

This paper gives an in-depth presentation of the multi-microphone speech recognition system we submitted to the 3rd CHiME speech separation and recognition challenge (CHiME-3) and its extension. The proposed system...

Conference Paper September 1, 2016

Coping with Unseen Data Conditions: Investigating Neural Net Architectures, Robust Features, and Information Fusion for Robust Speech Recognition

The introduction of deep neural networks has significantly improved automatic speech recognition performance. For real-world use, automatic speech recognition systems must cope with varying background conditions and unseen acoustic data....

Conference Paper September 1, 2016

Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech

Recognizing speech under high levels of channel and/or noise degradation is challenging. Current state-of-the-art automatic speech recognition systems are sensitive to changing acoustic conditions, which can cause significant performance degradation....

Conference Paper September 1, 2016

Automatic Speech Transcription for Low-Resource Languages — The Case of Yoloxóchitl Mixtec (Mexico)

The rate at which endangered languages can be documented has been highly constrained by human factors. Although digital recording of natural speech in endangered languages may proceed at a fairly...

Conference Paper September 1, 2016

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation

In this paper, we present the SRI system submission to the NIST OpenSAD 2015 speech activity detection (SAD) evaluation. We present results on three different development databases that we created...

Conference Paper September 1, 2016

Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen Nets

Often, prior knowledge of subword units is unavailable for low-resource languages. Instead, a global subword unit description, such as a universal phone set, is typically used in such scenarios. One...

Conference Paper June 1, 2016

Noise and reverberation effects on depression detection from speech

Speech-based depression detection has gained importance in recent years, but most research has used relatively quiet conditions or examined a single corpus per study. Little is thus known about the...

Conference Paper June 1, 2016

A Phonetically Aware System for Speech Activity Detection

Speech activity detection (SAD) is an essential component of most speech processing tasks and greatly influences the performance of the systems. Noise and channel distortions remain a challenge for SAD...

Conference Paper December 1, 2015

The MERL/SRI System for the 3rd chime challenge using beamforming, robust feature extraction and advanced speech recognition

This paper introduces the MERL/SRI system designed for the 3rd CHiME speech separation and recognition challenge (CHiME-3). Our proposed system takes advantage of recurrent neural networks (RNNs) throughout the model...

close check icon

Message Sent

Success! - Thank you for your interest.

share dwonload plus email external external project copy play call directions linkedin