Author: SRI International

September 1, 2005

Pushing the Envelope — Aside

Despite successes, there are still significant limitations to speech recognition performance. For this reason, authors have proposed methods that incorporate different (and larger) analysis windows, which are described in this article.
September 1, 2005

Spoken Language Understanding

SLU systems contain an automatic speech recognition (ASR) component and must be robust to noise due to the spontaneous nature of spoken language and the errors introduced by ASR. SLU systems must perform text segmentation and understanding at the same time.
September 1, 2005

Does Active Learning Help Automatic Dialog Act Tagging in Meeting Data?

We ask if active learning with lexical cues can help for this task and this domain. To better address this question, we explore active learning for two different types of DA models — hidden Markov models (HMMs) and maximum entropy (maxent).
September 1, 2005

Comparing HMM, Maximum Entropy, and Conditional Random Fields for Disfluency Detection

We compare a generative hidden Markov model (HMM)-based approach and two conditional models — a maximum entropy (Maxent) model and a conditional random field (CRF) — for detecting disfluencies in speech. The conditional modeling approaches provide a more principled way to model correlated features.
September 1, 2005

Improved Discriminative Training Using Phone Lattices

We present an efficient discriminative training procedure utilizing phone lattices. Different approaches to expediting lattice generation, statistics collection, and convergence were studied.
September 1, 2005

Two Experiments Comparing Reading with Listening for Human Processing of Conversational Telephone Speech

We report on results of two experiments designed to compare subjects’ ability to extract information from audio recordings of conversational telephone speech (CTS) with their ability to extract information from text transcripts of these conversations, with and without the ability to hear the audio recordings.
September 1, 2005

Class-dependent Score Combination for Speaker Recognition

In this work, we are presenting a class-based score combination technique that relies on clustering of both the target models and the test utterances in a vector space defined by a set of speaker-specific transformation parameters estimated during transcription of the talker.
September 1, 2005

MLLR Transforms as Features in Speaker Recognition

We explore the use of adaptation transforms employed in speech recognition systems as features for speaker recognition. This approach is attractive because, unlike standard frame-based cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verification.
August 1, 2005

A Robust Method for Tracking Scene Text in Video Imagery

We describe an approach that tracks planar regions of scene text that can undergo arbitrary 3-D rigid motion and scale changes. Our approach computes homographies on blocks of contiguous frames simultaneously using a combination of factorization and robust statistical methods.
July 1, 2005

Evidence-Centered Assessment Design: Layers, Structures, and Terminology (Padi Technical Report 9)

This presentation provides an overview of ECD, highlighting the ideas of layers in the process, structures and representations within layers, and terms and concepts that can be used to guide the design of assessments of practically all types. Examples are drawn from the Principled Assessment Designs for Inquiry (PADI) project.
July 1, 2005

Task Templates Based on Misconception Research (Padi Technical Report 6)

This paper reports one such effort, motivated by assessments that elicit students’ qualitative explanations of situations that have been designed to provoke misconceptions and partial understandings. We describe four task-specific templates we created—three based on Hestenes, Wells, and Swackhamer’s Force Concept Inventory and one based on Novick and Nussbaums’s Test about Particles in a Gas.
July 1, 2005

Identifying and Segmenting Human-Motion for Mobile Robot Navigation using alignment errors

This paper presents a new human-motion identification and segmentation algorithm from moving cameras. The algorithm is based on alignment error between pairs of moving object images. Pairs of object images generating relatively small alignment errors are used to estimate the fundamental frequency of the object motion.