Search results for: “stolcke”

September 1, 2006

Speaker Clustered Regression-Class Trees for MLLR Adaptation

A speaker clustering algorithm is presented that is based on an eigenspace representation of Maximum Likelihood Linear Regression (MLLR) transformations and is used for training cluster-dependent regression-class trees for MLLR adaptation.
September 1, 2006

Within-Class Covariance Normalization for SVM-based Speaker Recognition

This paper extends the within-class covariance normalization (WCCN) technique described for training generalized linear kernels.
September 1, 2006

Enriching Speech Recognition with Automatic Detection of Sentence Boundaries and Disfluencies

This paper describes a metadata detection system that combines information from different types of textual knowledge sources with information from a prosodic classifier. We investigate maximum entropy and conditional random field models, in addition to the predominant HMM approach, and find that discriminative models generally provide benefit over generative models.
September 1, 2006

Improved Speech Activity Detection Using Cross-Channel Features for Recognition of Multiparty Meetings

We describe the development of a speech activity detection system using an HMM-based segmenter for automatic speech recognition on individual headset microphones in multispeaker meetings. We look at cross-channel features (energy and correlation based) to incorporate into the segmenter for the purpose of addressing errors related to cross-channel phenomena such as crosstalk.
June 1, 2006

Improvements in MLLR-Transform-based Speaker Recognition

We previously proposed the use of MLLR transforms derived from a speech recognition system as speaker features in a speaker verification system. In this paper we report recent improvements to this approach.
May 1, 2006

Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition

In this paper, we examine the problem of kernel selection for one-versus-all (OVA) classification of multiclass data with support vector machines (SVMs). We focus specifically on the problem of training what we refer to as generalized linear kernels–that is, kernels of the form, k(x_1,x_2) = x_1^T R x_2, where R is a positive semidefinite matrix.
May 1, 2006

Combining Prosodic, Lexical and Cepstral Systems for Deceptive Speech Detection

We report on machine learning experiments to distinguish deceptive from nondeceptive speech in the Columbia-SRI-Colorado (CSC) corpus. Specifically, we propose a system combination approach using different models and features for deception detection.
May 1, 2006

Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings

This paper investigates a scheme for joint segmentation and classification of dialog acts (DAs) of the ICSI Meeting Corpus based on hidden-event language models and a maximum entropy classifier for the modeling of word boundary types.
May 1, 2006

Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons

In this paper we investigate how portable such features are across domains and languages. We show that even without retraining, English-trained MLP features can provide a significant boost to recognition accuracy in new domains within the same language, as well as in entirely different languages such as Mandarin and Arabic.
January 1, 2006

Recent Innovations in Speech-to-Text Transcription at SRI-ICSI-UW

We summarize recent progress in automatic speech-to-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling.
November 1, 2005

Incorporating Tandem / HATs MLP Features into SRI’s Conversational Speech Recognition System

We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLPs). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features.
November 1, 2005

Combining Feature Sets with Support Vector Machines: Application to Speaker Recognition

In this paper, we describe a general technique for optimizing the relative weights of feature sets in a support vector machine (SVM) and show how it can be applied to the field of speaker recognition. Our training procedure uses an objective function that maps the relative weights of the feature sets directly to a classification…