Search results for: “stolcke”

November 1, 2005

Incorporating Tandem / HATs MLP Features into SRI’s Conversational Speech Recognition System

We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLPs). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features.
September 1, 2005

Two Experiments Comparing Reading with Listening for Human Processing of Conversational Telephone Speech

We report on results of two experiments designed to compare subjects’ ability to extract information from audio recordings of conversational telephone speech (CTS) with their ability to extract information from text transcripts of these conversations, with and without the ability to hear the audio recordings.
September 1, 2005

Does Active Learning Help Automatic Dialog Act Tagging in Meeting Data?

We ask if active learning with lexical cues can help for this task and this domain. To better address this question, we explore active learning for two different types of DA models — hidden Markov models (HMMs) and maximum entropy (maxent).
September 1, 2005

Improved Discriminative Training Using Phone Lattices

We present an efficient discriminative training procedure utilizing phone lattices. Different approaches to expediting lattice generation, statistics collection, and convergence were studied.
September 1, 2005

Leveraging Speaker-dependent Variation of Adaptation

This work introduces an automatic procedure for determining the size of regression class trees for individual speakers using an ensemble of speaker-level features to control the number of transformations, if any, that should be estimated by maximum likelihood linear regression.
September 1, 2005

Comparing HMM, Maximum Entropy, and Conditional Random Fields for Disfluency Detection

We compare a generative hidden Markov model (HMM)-based approach and two conditional models — a maximum entropy (Maxent) model and a conditional random field (CRF) — for detecting disfluencies in speech. The conditional modeling approaches provide a more principled way to model correlated features.
September 1, 2005

Using MLP Features in SRI’s Conversational Speech Recognition System

We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLP). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features.
September 1, 2005

Development of a Conversational Telephone Speech Recognizer for Levantine Arabic

In this paper, we describe the development of a large-vocabulary speech recognition system for Levantine Arabic, which was a new dialectal recognition task for our existing system. We discuss the dialect-specific modeling choices and investigate to what extent techniques previously tested on other languages are portable to the present task.
June 1, 2005

Using Conditional Random Fields for Sentence Boundary Detection in Speech

In this paper, we evaluate the use of a conditional random field (CRF) for this task and relate results with this model to our prior work. We evaluate across two corpora (conversational telephone speech and broadcast news speech) on both human transcriptions and speech recognition output.
March 1, 2005

SRI’s 2004 NIST Speaker Recognition Evaluation System

This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams.
March 1, 2005

Structural Metadata Research in the EARS Program

In this paper we provide a brief overview of research on structural metadata extraction in the DARPA EARS rich transcription program. Tasks include detection of sentence boundaries, filler words, and disfluencies.
March 1, 2005

Improved Phonetic Speaker Recognition Using Lattice Decoding

In this paper, we present results on the Switchboard-2 corpus, where we compare 1-best phone decodings versus lattice phone decodings for the purposes of performing phonetic speaker recognition.