September 1, 2014

Content Matching for Short Duration Speaker Recognition

Citation

Scheffer, N., & Lei, Y. (2014, September). Content matching for short duration speaker recognition. In Interspeech (pp. 1317-1321).

Abstract

This work attempts to tackle the problem of content mismatch for short duration speaker veriﬁcation. Experiments are run on both text-dependent and ext-independent protocols, where a larger amount of enrollment data is available in the latter. We recently proposed a framework based on a deep neural network that explicitly utilizes phonetic information, and showed increased performance on long duration utterances. We show how this new framework can also yield signiﬁcant improvements for short duration. We then propose an innovative approach to perform content matching, i.e. transforming a textindependent trial into a text-dependent one by mining content from a speaker’s enrollment data to match the test utterance. We show how content matching can be effectively done at the statistics level to enable the use of standard veriﬁcation backends. Experiments – run on the RSR2015 and NIST SRE 2010 data sets – show relative improvements of 50% for cases where the content has been said during enrollment. While no signiﬁcant improvements were observed for the general text-independent case, we believe that this work might pave the way for new research for speaker veriﬁcation with very short utterances.

↓ Download

Content Matching for Short Duration Speaker Recognition

Abstract

Read more from SRI

Researchers develop materials that can take on the toughest conditions

Podcast: Re-imagining instructional quality and coaching

SRI’s Genome Explorer: Enhanced genome browser delivers better user experience