Analyzing the effect of channel mismatch on the SRI language recognition evaluation 2015 system

Citation

M. McLaren, D. Castan and L. Ferrer, “Analyzing the effect of channel mismatch on the SRI language recognition evaluation 2015 system,” in Proc. Odyssey 2016, pp. 188-195

Abstract

We present the work done by our group for the 2015 language recognition evaluation (LRE) organized by the National Institute of Standards and Technology (NIST). The focus of this evaluation was the development of language recognition systems for clusters of closely related languages using training data released by NIST. This training data contained a highly imbalanced sample from the languages of interest. The SRI team submitted several systems to LRE’15. Major components included (1) bottleneck features extracted from Deep Neural Networks (DNNs) trained to predict English senones, with multiple DNNs trained using a variety of acoustic features; (2) data-driven Discrete Cosine Transform (DCT) contextualization of features for traditional Universal Background Model (UBM) i-vector extraction and for input to a DNN for bottleneck feature extraction; (3) adaptive Gaussian backend scoring; (4) a newly developed multiresolution neural network backend; and (5) cluster-specific N-way fusion of scores. We compare results on our development dataset with those on the evaluation data and find significantly different conclusions about which techniques were useful for each dataset. This difference was due mostly to a large unexpected mismatch in acoustic conditions between the two datasets. We provide a post-evaluation analysis that reveals that the successful approaches for this evaluation included the use of bottleneck features, and a well-defined development dataset appropriate for mismatched conditions.

Index Terms— Language Recognition, Bottleneck Features, Deep Neural Networks, Mismatched conditions.


Read more from SRI