Handset-Dependent Background Models for Robust Text-Independent Speaker Recognition


L. P. Heck and M. Weintraub, “Handset-dependent background models for robust text-independent speaker recognition,” 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997, pp. 1071-1074 vol.2, doi: 10.1109/ICASSP.1997.596126.


This paper studies the effects of handset distortion on telephone-based speaker recognition performance, resulting in the following observations: (1) the major factor in speaker recognition errors is whether the handset type (e.g., electret, carbon) is different across training and testing, not whether the telephone lines are mismatched, (2) the distribution of speaker recognition scores for true speakers is bimodal, with one mode dominated by matched handset tests and the other by mismatched handsets, (3) cohort-based normalization methods derive much of their performance gains from implicitly selecting cohorts trained with the same handset type as the claimant, and (4) utilizing a handset-dependent background model which is matched to the handset type of the claimant’s training data sharpens and separates the true and false speaker score distributions. Results on the 1996 NIST Speaker Recognition Evaluation corpus show that using handset-matched background models reduces false acceptances (at a 10% miss rate) by more than 60% over previously reported (handset-independent) approaches.

Read more from SRI