Search results for: “stolcke”
-
The Meeting Project at ICSI
In collaboration with colleagues at UW, OGI, IBM, and SRI, we are developing technology to process spoken language from informal meetings.
-
Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks
We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses.
-
An Efficient Repair Procedure For Quick Transcriptions
The procedure we propose in this paper aims to em cleanse/ such quick transcriptions so that they align better with the acoustic evidence and thus provide for better acoustic models for automatic speech recognition (ASR).
-
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence.
-
The SRI March 2000 Hub-5 Conversational Speech Transcription System
We describe SRI’s large vocabulary conversational speech recognition system as used in the March 2000 NIST Hub-5E evaluation.
-
Prosody-Based Automatic Segmentation of Speech into Sentences and Topics
Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models–for both true and automatically recognized words in news speech.
-
Language Modelling for Multilingual Speech Translation
As with acoustic modelling, sparse training data is one of the main problems in language modelling tasks. We ideally want to have enough properly matched data to train models for all the necessary conditions.
-
Rate-dependent Acoustic Modeling for Large Vocabulary Conversational Speech Recognition
In this paper, we evaluate our approach on a large-vocabulary conversational speech recognition (LVCSR) task over the telephone, with several minimal pair comparisons based on different baseline systems.
-
Finding Consensus Among Words: Lattice-based Word Error Minimization
We describe a new algorithm for finding the hypothesis in a recognition lattice that is expected to minimize the word error rate (WER). Our approach thus overcomes the mismatch between the word-based performance metric and the standard MAP scoring paradigm that is sentence-based, and that can lead to sub-optimal recognition results.
-
Modeling the Prosody of Hidden Events for Improved Word Recognition
We investigate a new approach for using speech prosody as a knowledge source for speech recognition. The idea is to penalize word hypotheses that are inconsistent with prosodic features such as duration and pitch.
-
Combining Words and Prosody for Information Extraction from Speech
In this work we demonstrate the use of em prosodic cues, alone and in combination with words, for segmentation and name finding. In experiments, we find that prosodic cues alone allow sentence and topic segmentation that is at least as good as word-based methods alone, and that combining both types of cues gives significant wins.
-
Combining Words and Speech Prosody for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topic units. The approach combines hidden Markov models, statistical language models, and prosody-based decision trees.