Author: Horacio Franco
-
Deep convolutional nets and robust features for reverberations-robust speech recognition
In this work, we present robust acoustic features motivated by human speech perception for use in a convolutional deep neural network-based acoustic model for recognizing continuous speech in a reverberant condition.
-
Recent Improvements in SRI’s Keyword Detection System for Noisy Audio
We present improvements to a keyword spotting (KWS) system that operates in highly adverse channel conditions with very low signal-to-noise ratio levels.
-
Evaluating Robust Features on Deep Neural Networks for Speech Recognition in Noisy and Channel Mismatched Conditions
In this work we present a study exploring both conventional DNNs and deep Convolutional Neural Networks (CNN) for noise- and channel-degraded speech recognition tasks using the Aurora4 dataset.
-
Adaptive and Discriminative Modeling for Improved Mispronunciation Detection
In the context of computer-aided language learning, automatic detection of specific phone mispronunciations by nonnative speakers can be used to provide detailed feedback about specific pronunciation problems.
-
Lexical Stress Classification for Language Learning Using Spectral and Segmental Features
We present a system for detecting lexical stress in English words spoken by English learners. The system uses both spectral and segmental features to detect three levels of stress for each syllable in a word.
-
Medium-Duration Modulation Cepstral Feature for Robust Speech Recognition
In this paper, we present the Modulation of Medium Duration Speech Amplitude feature, which is a composite feature capturing subband speech modulations and a summary modulation.
-
Feature Fusion for High-Accuracy Keyword Spotting
This paper assesses the role of robust acoustic features in spoken term detection (a.k.a keyword spotting—KWS) under heavily degraded channel and noise corrupted conditions.
-
Damped oscillator cepstral coefficients for robust speech recognition
This paper presents a new signal-processing technique motivated by the physiology of human auditory system.
-
Strategies for high accuracy keyword detection in noisy channels
We present design strategies for a keyword spotting (KWS) system that operates in highly degraded channel conditions with very low signal-to-noise ratio levels.
-
All for one: Feature combination for highly channel-degraded speech activity detection
This paper presents a feature combination approach to improve SAD on highly channel degraded speech as part of the Defense Advanced Research Projects Agency’s (DARPA) Robust Automatic Transcription of Speech (RATS) program.
-
Modulation features for noise robust speaker identification
In this paper, we present a robust acoustic feature on top of robust modeling techniques to further improve speaker identification performance.
-
Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
In this work, we present an amplitude modulation feature derived from Teager’s nonlinear energy operator that is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features…