This work presents a modified CDNN architecture that we call the time-frequency convolutional network (TFCNN), in which two parallel layers of convolution are performed on the input feature space: convolution across time and frequency, each using a different pooling layer.
Improving robustness against reverberation for automatic speech recognition
In this work, we explore the role of robust acoustic features motivated by human speech perception studies, for building ASR systems robust to reverberation effects.
Classification of Lexical Stress Using Spectral and Prosodic Features for Computer-assisted Language Learning Systems
We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software.
Deep convolutional nets and robust features for reverberations-robust speech recognition
In this work, we present robust acoustic features motivated by human speech perception for use in a convolutional deep neural network-based acoustic model for recognizing continuous speech in a reverberant condition.
Evaluating Robust Features on Deep Neural Networks for Speech Recognition in Noisy and Channel Mismatched Conditions
In this work we present a study exploring both conventional DNNs and deep Convolutional Neural Networks (CNN) for noise- and channel-degraded speech recognition tasks using the Aurora4 dataset.
Recent Improvements in SRI’s Keyword Detection System for Noisy Audio
We present improvements to a keyword spotting (KWS) system that operates in highly adverse channel conditions with very low signal-to-noise ratio levels.
Lexical Stress Classification for Language Learning Using Spectral and Segmental Features
We present a system for detecting lexical stress in English words spoken by English learners. The system uses both spectral and segmental features to detect three levels of stress for each syllable in a word.
Medium-Duration Modulation Cepstral Feature for Robust Speech Recognition
In this paper, we present the Modulation of Medium Duration Speech Amplitude feature, which is a composite feature capturing subband speech modulations and a summary modulation.
Feature Fusion for High-Accuracy Keyword Spotting
This paper assesses the role of robust acoustic features in spoken term detection (a.k.a keyword spotting—KWS) under heavily degraded channel and noise corrupted conditions.