Search results for: “stolcke”
-
Speaker Clustered Regression-Class Trees for MLLR Adaptation
A speaker clustering algorithm is presented that is based on an eigenspace representation of Maximum Likelihood Linear Regression (MLLR) transformations and is used for training cluster-dependent regression-class trees for MLLR adaptation.
-
Within-Class Covariance Normalization for SVM-based Speaker Recognition
This paper extends the within-class covariance normalization (WCCN) technique described for training generalized linear kernels.
-
Enriching Speech Recognition with Automatic Detection of Sentence Boundaries and Disfluencies
This paper describes a metadata detection system that combines information from different types of textual knowledge sources with information from a prosodic classifier. We investigate maximum entropy and conditional random field models, in addition to the predominant HMM approach, and find that discriminative models generally provide benefit over generative models.
-
Improved Speech Activity Detection Using Cross-Channel Features for Recognition of Multiparty Meetings
We describe the development of a speech activity detection system using an HMM-based segmenter for automatic speech recognition on individual headset microphones in multispeaker meetings. We look at cross-channel features (energy and correlation based) to incorporate into the segmenter for the purpose of addressing errors related to cross-channel phenomena such as crosstalk.
-
Improvements in MLLR-Transform-based Speaker Recognition
We previously proposed the use of MLLR transforms derived from a speech recognition system as speaker features in a speaker verification system. In this paper we report recent improvements to this approach.
-
Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition
In this paper, we examine the problem of kernel selection for one-versus-all (OVA) classification of multiclass data with support vector machines (SVMs). We focus specifically on the problem of training what we refer to as generalized linear kernels–that is, kernels of the form, k(x_1,x_2) = x_1^T R x_2, where R is a positive semidefinite matrix.
-
Combining Prosodic, Lexical and Cepstral Systems for Deceptive Speech Detection
We report on machine learning experiments to distinguish deceptive from nondeceptive speech in the Columbia-SRI-Colorado (CSC) corpus. Specifically, we propose a system combination approach using different models and features for deception detection.
-
Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings
This paper investigates a scheme for joint segmentation and classification of dialog acts (DAs) of the ICSI Meeting Corpus based on hidden-event language models and a maximum entropy classifier for the modeling of word boundary types.
-
Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons
In this paper we investigate how portable such features are across domains and languages. We show that even without retraining, English-trained MLP features can provide a significant boost to recognition accuracy in new domains within the same language, as well as in entirely different languages such as Mandarin and Arabic.
-
Recent Innovations in Speech-to-Text Transcription at SRI-ICSI-UW
We summarize recent progress in automatic speech-to-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling.
-
Incorporating Tandem / HATs MLP Features into SRI’s Conversational Speech Recognition System
We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLPs). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features.
-
Combining Feature Sets with Support Vector Machines: Application to Speaker Recognition
In this paper, we describe a general technique for optimizing the relative weights of feature sets in a support vector machine (SVM) and show how it can be applied to the field of speaker recognition. Our training procedure uses an objective function that maps the relative weights of the feature sets directly to a classification…