Language-independent constrained cepstral features for speaker recognition

Citation

E. Shriberg and A. Stolcke, “Language-independent constrained cepstral features for speaker recognition,” in Proc. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011), pp. 5296–5299.

Abstract

Constrained cepstral systems, which select frames to match various linguistic “constraints” in enrollment and test, have shown significant improvements for speaker verification performance.  Past work, however, relied on word recognition, making the approach language dependent (LD). We develop language-independent (LI) versions of constraints and compare results to parallel LD versions for English data on the NIST 2008 interview task. Results indicate that (1) LI versions show surprisingly little degradation from associated LD versions, (2) some LI constraints outperform their LD counterparts, (3) useful constraint types include phonetic, syllable position, prosodic, and speaking-rate regions, (4) benefits generally hold for different train/test lengths, and (5) constraints provide particular benefit in reducing false alarms.  Overall, we conclude that constrained cepstral modeling can benefit speaker recognition without the need for language-dependent automatic speech recognition.

Index Terms—language-independent phone recognition, cepstral constraints, speaker verification.


Read more from SRI

  • An arid, rural Nevada landscape

    Can AI help us find valuable minerals?

    SRI’s machine learning-based geospatial analytics platform, already adopted by the USGS, is poised to make waves in the mining industry.

  • Two students in a computer lab

    Building a lab-to-market pipeline for education

    The SRI-led LEARN Network demonstrates how we can get the best evidence-based educational programs to classrooms and students.

  • Code reflected in a man's eyeglasses

    LLM risks from A to Z

    A new paper from SRI and Brazil’s Instituto Eldorado delivers a comprehensive update on the security risks to large language models.