Speaker Recognition with Session Variability Normalization Based on MLLR Adaptation Transforms


Stolcke, A., Kajarekar, S. S., Ferrer, L., & Shrinberg, E. (2007). Speaker recognition with session variability normalization based on MLLR adaptation transforms. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 1987-1998.


We present a new modeling approach for speaker recognition that uses the maximum-likelihood linear regression (MLLR) adaptation transforms employed by a speech recognition system as features for support vector machine (SVM) speaker models. This approach is attractive because, unlike standard frame-based cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verification without data fragmentation. We discuss the basics of the MLLR-SVM approach, and show how it can be enhanced by combining transforms relative to multiple reference models, with excellent results on recent English NIST evaluation sets. We then show how the approach can be applied even if no full word-level recognition system is available, which allows its use on non-English data even without matching speech recognizers.

Read more from SRI