Author: SRI International
-
Multirate ASR Models for Phone-Class Dependent N-Best List Rescoring
In this work, we describe a technique to augment a recognizer that uses this compromise with information from multiple-rate spectral models that emphasize either better time or better frequency resolution in order to improve performance.
-
VERL: An ontology framework for representing and annotating video events
This article describes the findings of a recent workshop series that has produced an ontology framework for representing video events-called Video Event Representation Language (VERL) -and a companion annotation framework, called Video Event Markup Language (VEML). One of the key concepts in this work is the modeling of events as composable, whereby complex events are…
-
Singapore Tablet PC Program Study: Executive Summary and Final Report, Volume 1, Technical Findings
-
Singapore Tablet PC Program Study: Executive Summary and Final Report, Volume 2, Technical Appendicies
-
Task Management under Change and Uncertainty: Constraint Solving Experience with the CALO Project
We outline the challenges and opportunities presented by constraint solving in the presence of change and uncertainty, embodied in CALO’s personalized time management and task reasoning and execution systems.
-
Mapping The Distribution Of Expertise And Resources In A School: Investigating The Potential Of Using Social Network Analysis In Evaluation
This paper describes results of a study investigating the potential of using social network analysis to evaluate the capacity of a school to undertake a schoolwide educational reform.
-
MLLR Transforms as Features in Speaker Recognition
We explore the use of adaptation transforms employed in speech recognition systems as features for speaker recognition. This approach is attractive because, unlike standard frame-based cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verification.
-
Generation of fast interpreters for Huffman compressed bytecode
Our approach uses canonical Huffman codes to generate compact opcodes with custom-sized operand fields and with a virtual machine that directly executes this compact code. In effect, this automatically creates both an instruction set for a customized virtual machine and an implementation of that machine.
-
Leveraging Speaker-dependent Variation of Adaptation
This work introduces an automatic procedure for determining the size of regression class trees for individual speakers using an ensemble of speaker-level features to control the number of transformations, if any, that should be estimated by maximum likelihood linear regression.
-
Using MLP Features in SRI’s Conversational Speech Recognition System
We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLP). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features.
-
Does Active Learning Help Automatic Dialog Act Tagging in Meeting Data?
We ask if active learning with lexical cues can help for this task and this domain. To better address this question, we explore active learning for two different types of DA models — hidden Markov models (HMMs) and maximum entropy (maxent).
-
Pushing the Envelope — Aside
Despite successes, there are still significant limitations to speech recognition performance. For this reason, authors have proposed methods that incorporate different (and larger) analysis windows, which are described in this article.