SRI Authors: Mitchell McLaren
Recent Developments in Voice Biometrics: Robustness and High Accuracy
Recently, researchers have tackled difficult voice biometrics problems that resonate with the defense and research communities. These problems include non-ideal recording conditions that are frequently found in operational scenarios, such as noise, reverberation, degraded channels, and compressed audio. In this article, we highlight SRI’s innovations that resulted from the IARPA Biometrics Exploitation Science & Technology (BEST) and the DARPA Robust Automatic Transcription of Speech (RATS) programs, as well as SRI’s approach for codec degraded speech. We show how these advancements support the case for the biometrics community adopting the use of speaker recognition.
Modulation features for noise robust speaker identification
In this paper, we present a robust acoustic feature on top of robust modeling techniques to further improve speaker identification performance.
A Noise-Robust System for NIST 2012 Speaker Recognition Evaluation
The National Institute of Standards and Technology (NIST) 2012 speaker recognition evaluation posed several new challenges including noisy data, varying test-sample length and number of enrollment samples, and a new metric.
Adaptive Gaussian Backend for Robust Language Identification
This paper proposes adaptive Gaussian backend (AGB), a novel approach to robust language identification (LID). In this approach, a given test sample is compared to language-specific training data in order to dynamically select data for a trial-specific language model. Discriminative AGB additionally weights the training data to maximize discrimination against the test segment. Evaluated on heavily degraded speech data, discriminative AGB provides relative improvements of up to 45% and 38% in equal error rates (EER) over the widely adopted Gaussian backend (GB) and neural network (NN) approaches to LID, respectively. Discriminative AGB also significantly outperforms those techniques at shorter test durations, while demonstrating robustness to limited training resources and to mismatch between training and testing speech duration. The efficacy of AGB is validated on clean speech data from National Institute of Standards and Technology (NIST) language recognition evaluation (LRE) 2009, on which it was found to provide improvements over the GB and NN approaches.
Improving Language Identification Robustness to Highly Channel-Degraded Speech through Multiple System Fusion
We describe a language identification system developed for robustess to noise conditions such as those encountered under the DARPA RATS program, which is focused on multi-channel audio collected in high noise conditions. Work presented here includes novel approaches to scoring iVectors, the introduction of several new acoustic and prosodic features for language identification, and discriminative file selection approaches to score calibration. Further, we explore the use of Discrete Cosine Transforms (DCT) as a supplement to traditional context modeling with Shifted Delta Cepstrum (SDC) and fusion of multiple iVector systems based on Gaussian backends, neural networks, and adaptive Gaussian backend modeling.
Improving Speaker Identification Robustness to Highly Channel-Degraded Speech Through Multiple System Fusion
This article describes our submission to the speaker identification (SID) evaluation for the first phase of the DARPA Robust Audio and Transcription of Speech (RATS) program.