back icon
close icon

Capture phrases in quotes for more specific queries (e.g. "rocket ship" or "Fred Lynn")

Conference Proceeding  September 2018

Analysis of Complementary Information Sources in the Speaker Embeddings Framework

SRI Authors Mahesh Nandwana, Mitchell McLaren, Diego Castán, Julien van Hout, Aaron Lawson

Citation

COPY

M. Kumar Nandwana, M. McLaren, D. Castan, Julien van Hout and A. Lawson, “Analysis of complementary information sources in the speaker embeddings framework,” Interspeech 2018, Hyderabad, Telangana, India. Forthcoming September 2018.

Abstract

Deep neural network (DNN)-based speaker embeddings have resulted in new, state-of-the-art text-independent speaker recognition technology. However, very limited effort has been made to understand DNN speaker embeddings. In this study, our aim is analyzing the behavior of the speaker recognition systems based on speaker embeddings toward different front-end features, including the standard Mel frequency cepstral coefficients (MFCC), as well as power normalized cepstral coefficients (PNCC), and perceptual linear prediction (PLP). Using a speaker recognition system based on DNN speaker embeddings and probabilistic linear discriminant analysis (PLDA), we compared different approaches to leveraging complementary information using score-, embeddings-, and feature-level combination. We report our results for Speakers in the Wild (SITW) and NIST SRE 2016 datasets. We found that first and second embeddings layers are complementary in nature. By applying score and embedding-level fusion we demonstrate relative improvements in equal error rate of 17% on NIST SRE 2016 and 10% on SITW over the baseline system.

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you. Expect a response within 48 hours.

Our Privacy Policy