M. Graciarena, S. Kajarekar, A. Stolcke and E. Shriberg, “Noise Robust Speaker Identification for Spontaneous Arabic Speech,” 2007 IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP ’07, 2007, pp. IV-245-IV-248, doi: 10.1109/ICASSP.2007.366895.
Two important challenges for speaker recognition applications are noise robustness and portability to new languages. We present an approach that integrates multiple components and models for improved speaker identification in spontaneous Arabic speech in adverse acoustic conditions. We used two different acoustic speaker models: cepstral Gaussian mixture models (GMM) and maximum likelihood linear regression support vector machine (MLLR-SVM) models and a neural network combiner. The noise-robust components are Wiener filtering, speech-nonspeech segmentation, and frame selection. We present baselines and results on the Arabic portion of the NIST Mixer data, in clean conditions and with added noise at different signal-to-noise ratios. We used two realistic noises: babble and city traffic. In both noisy scenarios, we found significant equal error rate (EER) reductions over the no-compensation condition. The various noise robustness methods gave complementary gains for both acoustic models. Finally, the combiner provides a reduction in EER over the individual systems in noisy conditions.
Index Terms: Speaker identification, Robustness, Arabic, cepstral GMM, MLLR-SVM