Simplified VTS-Based I-Vector Extraction in Noise-Robust Speaker Recognition


Lei, Y., McLaren, M., Ferrer, L., & Scheffer, N. (2014, 4-9 May). Simplified VTS-based I-vector extraction in noise-robust speaker recognition. Paper presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’14), Florence, Italy.


A vector taylor series (VTS) based i-vector extractor was recently proposed for noise-robust speaker recognition by extracting synthesized clean i-vectors to be used in the standard system back-end. This approach brings significant improvements in accuracy for noisy speech conditions. However, this approach incurred such a large computational expense that using the state-of-the-art model size or evaluating large scale evaluations was impractical. In this work, we propose an efficient simplification scheme, named sVTS, in order to show that the VTS approach gives improvements in large scale applications compared to state-of-the-art systems. In contrast to VTS, sVTS generates normalized Baum-Welch statistics and uses the standard i-vector model, making it straightforward to employ on the state-of-the-art i-vector speaker recognition system. Results presented on both the PRISM and the large NIST SRE’12 corpora show that using sVTS i-vectors provides significant improvements in the noisy conditions, and that our proposed simplification result in only a slight degradation with respect to the original VTS approach.

Keywords: Noise measurement, Speech, Noise, NIST, Computational modeling, Vectors.

Read more from SRI