Author: SRI International

September 1, 1997

A Study of Multilingual Speech Recognition

This paper describes our work in developing multilingual (Swedish and English) speech recognition systems in the ATIS domain. The acoustic component of the multilingual systems is realized through sharing Gaussian codebooks across Swedish and English allophones.
August 1, 1997

Multimodal Interfaces for Internet

In this paper, we present a Java-enabled application with a multimodal (pen and voice) interface over the web. Our implementation approach was to add Java to the set of languages accepted by the Open Agent Architecture (OAA), a framework for rapidly prototyping complex applications, and particularly suited to those with multimodal interfaces.
June 1, 1997

Using Differential Constraints to Reconstruct Complex Surfaces from Stereo

Stereo reconstruction algorithms often fail to properly deal with complex surfaces, because there is not enough image information. We propose to guide the reconstruction process using a priori information about the differential geometry of the object surfaces.
April 1, 1997

Model Transformation for Robust Speaker Recognition from Telephone Data

In the context of automatic speaker recognition, we propose a model transformation technique that renders speaker models more robust to acoustic mismatches and to data scarcity by appropriately increasing their variances.
April 1, 1997

Handset-Dependent Background Models for Robust Text-Independent Speaker Recognition

This paper studies the effects of handset distortion on telephone-based speaker recognition performance. Results on the 1996 NIST Speaker Recognition Evaluation corpus show that using handset-matched background models reduces false acceptances (at a 10% miss rate) by more than 60% over previously reported (handset-independent) approaches.
April 1, 1997

Neural-Network Based Measures of Confidence for Word Recognition

This paper proposes a probabilstic framework to define and evaluate confidence measures for word recognition. We describe a novel method to combine different knowledge sources and estimate the confidence in a word hypothesis, via a neural network.
March 1, 1997

HTTP://WWW.SPEECH.SRI.COM/DEMOS/ATIS.HTML

This paper presents a speech-enabled WWW demonstration based on the Air Travel Information System (ATIS) domain. SRI’s speech recognition technology and natural language understanding are fully integrated in a Java application using the DECIPHER(TM) speech recognition system and the Open Agent Architecture(TM).
February 1, 1997

Acoustic Modeling for the SRI Hub4 Partitioned Evaluation Continuous Speech Recognition System

We describe the development of the SRI system evaluated in the 1996 DARPA continuous speech recognition (CSR) Hub4 partitioned evaluation (PE). The task for the Hub4 evaluation was to recognition speech from broadcast television and radio shows.
February 1, 1997

Hub4 Language Modeling Using Domain Interpolation and Data Clustering

In SRI’s language modeling experiments for the Hub4 domain, three basic approaches were pursued: interpolating multiple models estimated from Hub4 and non-Hub4 training data, adapting the language model (LM) to the focus conditions, and adapting the LM to different topic types.
January 1, 1997

Active And Supportive Computer-Mediated Resources For Student-To-Student Conversation

We provide quantitative data that suggests that seventh grade students who used PIE learned some of the basic principles of probability. Two cases studies are that illustrate how communication supported by computer-mediated representations contributed to this success.
January 1, 1997

Secondary-Postsecondary Linkages: The Missing Link In School-To-Work Initiatives
January 1, 1997

Mulitmodal User Interfaces in the Open Agent Architecture

The design and development of the Open Agent Architecture (OAA) system has focused on providing access to agentbased applications through an intelligent, cooperative, distributed, and multimodal agent-based user interfaces. The current multimodal interface supports a mix of spoken language, handwriting and gesture, and is adaptable to the user’s preferences, resources and environment.