E. Shriberg and A. Stolcke, “Language-independent constrained cepstral features for speaker recognition,” in Proc. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011), pp. 5296–5299.
Constrained cepstral systems, which select frames to match various linguistic “constraints” in enrollment and test, have shown significant improvements for speaker verification performance. Past work, however, relied on word recognition, making the approach language dependent (LD). We develop language-independent (LI) versions of constraints and compare results to parallel LD versions for English data on the NIST 2008 interview task. Results indicate that (1) LI versions show surprisingly little degradation from associated LD versions, (2) some LI constraints outperform their LD counterparts, (3) useful constraint types include phonetic, syllable position, prosodic, and speaking-rate regions, (4) benefits generally hold for different train/test lengths, and (5) constraints provide particular benefit in reducing false alarms. Overall, we conclude that constrained cepstral modeling can benefit speaker recognition without the need for language-dependent automatic speech recognition.