January 1, 2010

Prosodic speaker verification using subspace multinomial models with intersession compensation

Citation

M. Kockmann, L. Burget, O. Glembek, L. Ferrer and J.J. Cernocky, “Prosodic speaker verification using subspace multinomial models with intersession compensation,” in Proc. Interspeech 2010 : International Conference on Spoken Language Processing, pp. 1061–1064.

Abstract

We propose a novel approach to modeling prosodic features. Inspired by Joint Factor Analysis model (JFA), our model is based on the same idea of introducing subspace of model parameters. However, the underlying Gaussian Mixture distribution of JFA is replaced by multinomial distribution to model sequences of discrete units rather than continuous features. In this work, we use the subspace model as a feature extractor for support vector machines (SVMs), similar to the recently proposed JFA in total variability space. We can show the capability to reduce high-dimensional count vectors to low dimension while keeping system performance stable. With additional intersession compensation, we can improve 30pct relative to the baseline system and reach an equal error rate of 8.8pct on the NIST 2006 SRE dataset.

Keywords: speaker verification, prosody, JFA, multinomial model

↓ Download

Prosodic speaker verification using subspace multinomial models with intersession compensation

Abstract

Read more from SRI

Researchers develop materials that can take on the toughest conditions

Podcast: Re-imagining instructional quality and coaching

SRI’s Genome Explorer: Enhanced genome browser delivers better user experience