Search results for: “stolcke”

March 1, 2001

The Meeting Project at ICSI

In collaboration with colleagues at UW, OGI, IBM, and SRI, we are developing technology to process spoken language from informal meetings.
October 1, 2000

Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks

We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses.
October 1, 2000

An Efficient Repair Procedure For Quick Transcriptions

The procedure we propose in this paper aims to em cleanse/ such quick transcriptions so that they align better with the acoustic evidence and thus provide for better acoustic models for automatic speech recognition (ASR).
September 1, 2000

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence.
May 1, 2000

The SRI March 2000 Hub-5 Conversational Speech Transcription System

We describe SRI’s large vocabulary conversational speech recognition system as used in the March 2000 NIST Hub-5E evaluation.
January 1, 2000

Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models–for both true and automatically recognized words in news speech.
January 1, 2000

Language Modelling for Multilingual Speech Translation

As with acoustic modelling, sparse training data is one of the main problems in language modelling tasks. We ideally want to have enough properly matched data to train models for all the necessary conditions.
January 1, 2000

Rate-dependent Acoustic Modeling for Large Vocabulary Conversational Speech Recognition

In this paper, we evaluate our approach on a large-vocabulary conversational speech recognition (LVCSR) task over the telephone, with several minimal pair comparisons based on different baseline systems.
September 1, 1999

Finding Consensus Among Words: Lattice-based Word Error Minimization

We describe a new algorithm for finding the hypothesis in a recognition lattice that is expected to minimize the word error rate (WER). Our approach thus overcomes the mismatch between the word-based performance metric and the standard MAP scoring paradigm that is sentence-based, and that can lead to sub-optimal recognition results.
September 1, 1999

Modeling the Prosody of Hidden Events for Improved Word Recognition

We investigate a new approach for using speech prosody as a knowledge source for speech recognition. The idea is to penalize word hypotheses that are inconsistent with prosodic features such as duration and pitch.
September 1, 1999

Combining Words and Prosody for Information Extraction from Speech

In this work we demonstrate the use of em prosodic cues, alone and in combination with words, for segmentation and name finding. In experiments, we find that prosodic cues alone allow sentence and topic segmentation that is at least as good as word-based methods alone, and that combining both types of cues gives significant wins.
January 1, 1999

Combining Words and Speech Prosody for Automatic Topic Segmentation

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topic units. The approach combines hidden Markov models, statistical language models, and prosody-based decision trees.