In this paper we propose softSAD: the direct integration of speech posteriors into a speaker recognition system instead of using speech activity detection (SAD).
The SRI AVEC-2014 Evaluation System
We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale.
Evaluating Robust Features on Deep Neural Networks for Speech Recognition in Noisy and Channel Mismatched Conditions
In this work we present a study exploring both conventional DNNs and deep Convolutional Neural Networks (CNN) for noise- and channel-degraded speech recognition tasks using the Aurora4 dataset.
Recent Improvements in SRI’s Keyword Detection System for Noisy Audio
We present improvements to a keyword spotting (KWS) system that operates in highly adverse channel conditions with very low signal-to-noise ratio levels.
Medium-Duration Modulation Cepstral Feature for Robust Speech Recognition
In this paper, we present the Modulation of Medium Duration Speech Amplitude feature, which is a composite feature capturing subband speech modulations and a summary modulation.
Feature Fusion for High-Accuracy Keyword Spotting
This paper assesses the role of robust acoustic features in spoken term detection (a.k.a keyword spotting—KWS) under heavily degraded channel and noise corrupted conditions.
Improving Language Identification Robustness to Highly Channel-Degraded Speech through Multiple System Fusion
We describe a language identification system developed for robustess to noise conditions such as those encountered under the DARPA RATS program, which is focused on multi-channel audio collected in high noise conditions.
Damped oscillator cepstral coefficients for robust speech recognition
This paper presents a new signal-processing technique motivated by the physiology of human auditory system.
Strategies for high accuracy keyword detection in noisy channels
We present design strategies for a keyword spotting (KWS) system that operates in highly degraded channel conditions with very low signal-to-noise ratio levels.