Speech & natural language publications
-
A Multimodal Discourse Ontology for Meeting Understanding
In this paper, we present a multimodal discourse ontology that serves as a knowledge representation and annotation framework for the discourse understanding component of an artificial personal office assistant.
-
Incorporating Tandem / HATs MLP Features into SRI’s Conversational Speech Recognition System
We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLPs). The acoustic features are based on frame-level…
-
Combining Feature Sets with Support Vector Machines: Application to Speaker Recognition
In this paper, we describe a general technique for optimizing the relative weights of feature sets in a support vector machine (SVM) and show how it can be applied to…
-
Multirate ASR Models for Phone-Class Dependent N-Best List Rescoring
In this work, we describe a technique to augment a recognizer that uses this compromise with information from multiple-rate spectral models that emphasize either better time or better frequency resolution…
-
A* Based Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings
We investigate the use of the A* algorithm for joint segmentation and classification of dialog acts (DAs) of the ICSI Meeting Corpus. The proposed method is evaluated on both traditional…
-
Four Weightings and a Fusion: A Cepstral-SVM System for Speaker Recognition
A new speaker recognition system is described that uses Mel-frequency cepstral features. This system is a combination of four support vector machines (SVMs). All the SVM systems use polynomial features…
-
Development of a Conversational Telephone Speech Recognizer for Levantine Arabic
In this paper, we describe the development of a large-vocabulary speech recognition system for Levantine Arabic, which was a new dialectal recognition task for our existing system. We discuss the…
-
Leveraging Speaker-dependent Variation of Adaptation
This work introduces an automatic procedure for determining the size of regression class trees for individual speakers using an ensemble of speaker-level features to control the number of transformations, if…
-
Spoken Language Understanding
SLU systems contain an automatic speech recognition (ASR) component and must be robust to noise due to the spontaneous nature of spoken language and the errors introduced by ASR. SLU…
-
Two Experiments Comparing Reading with Listening for Human Processing of Conversational Telephone Speech
We report on results of two experiments designed to compare subjects’ ability to extract information from audio recordings of conversational telephone speech (CTS) with their ability to extract information from…
-
Improved Discriminative Training Using Phone Lattices
We present an efficient discriminative training procedure utilizing phone lattices. Different approaches to expediting lattice generation, statistics collection, and convergence were studied.
-
Meeting Structure Annotation: Data and Tools
We present a set of annotations of hierarchical topic segmentations and action item sub-dialogues collected over 65 meetings from the ICSI and ISL meeting corpora, designed to support automatic meeting…