This article provides details of the SITW speaker recognition challenge and analysis of evaluation results. We provide an analysis of some of the top performing systems submitted during the evaluation and provide future research directions.
The Speakers in the Wild (SITW) Speaker Recognition Database
The Speakers in the Wild (SITW) speaker recognition database contains hand-annotated speech samples from open-source media for the purpose of benchmarking text-independent speaker recognition technology.
Exploring the role of phonetic bottleneck features for speaker and language recognition
Using bottleneck features extracted from a deep neural network (DNN) trained to predict senone posteriors has resulted in new, state-of-the-art technology for language and speaker identification.
Analyzing the effect of channel mismatch on the SRI language recognition evaluation 2015 system
We present the work done by our group for the 2015 language recognition evaluation (LRE) organized by the National Institute of Standards and Technology (NIST).
Improving robustness against reverberation for automatic speech recognition
In this work, we explore the role of robust acoustic features motivated by human speech perception studies, for building ASR systems robust to reverberation effects.
Study of senone-based deep neural network approaches for spoken language recognition
This paper compares different approaches for using deep neural networks (DNNs) trained to predict senone posteriors for the task of spoken language recognition (SLR).
Speech-based assessment of PTSD in a military population using diverse feature classes
We analyzed recordings of the Clinician-Administered PTSD Scale (CAPS) interview from military personnel diagnosed as PTSD positive versus negative.
Mitigating the effects of non-stationary unseen noises on language recognition performance
We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance.
Softsad: Integrated frame-based speech confidence for speaker recognition
In this paper we propose softSAD: the direct integration of speech posteriors into a speaker recognition system instead of using speech activity detection (SAD).