Author: Andreas Kathol
-
Analysis and prediction of heart rate using speech features from natural speech
We predict HR from speech using the SRI BioFrustration Corpus.In contrast to previous studies we use continuous spontaneous speech as input.
-
Toward human-assisted lexical unit discovery without text resources
This work addresses lexical unit discovery for languages without (usable) written resources.
-
Automatic Speech Transcription for Low-Resource Languages — The Case of Yoloxóchitl Mixtec (Mexico)
In the present study, we focus exclusively on progress in developing speech recognition for the language of interest, Yoloxóchitl Mixtec (YM), an Oto-Manguean language spoken by fewer than 5000 speakers on the Pacific coast of Guerrero, Mexico.
-
The SRI CLEO Speaker-State Corpus
We introduce the SRI CLEO (Conversational Language about Everyday Objects) Speaker-State Corpus of speech, video, and biosignals.
-
Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system
This study examines two questions: how do undesirable system responses affect people physiologically, and to what extent can we predict physiological changes from the speech signal alone?
-
The SRI biofrustration corpus: Audio, video and physiological signals for continuous user modeling
We describe the SRI BioFrustration Corpus, an inprogress corpus of time-aligned audio, video, and autonomic nervous system signals recorded while users interact with a dialog system to make returns of faulty consumer items.
-
The SRI AVEC-2014 Evaluation System
We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale.
-
Robust Features and System Fusion for Reverberation-robust Speech Recognition
In this work, we present robust acoustic features motivated by the knowledge gained from human speech perception and production, and demonstrate that these features provide reasonable robustness to reverberation effects compared to traditional mel-filterbank-based features.
-
Strategies for high accuracy keyword detection in noisy channels
We present design strategies for a keyword spotting (KWS) system that operates in highly degraded channel conditions with very low signal-to-noise ratio levels.
-
“Can You Give Me Another Word for Hyperbaric?”: Improving Speech Translation Using Targeted Clarification Questions
We present a novel approach for improving communication success between users of speech-to-speech translation systems by automatically detecting errors in the output of automatic speech recognition (ASR) and statistical machine translation (SMT) systems.
-
Acoustic data sharing for Afghan and Persian languages
In this work, we compare several known approaches for multilingual acoustic modeling for three languages, Dari, Farsi and Pashto, which are of recent geo-political interest.
-
Recent advances in SRI’s IraqComm Iraqi Arabic-English speech-to-speech translation system
We summarize recent progress on SRI’s IraqComm™ IraqiArabic-English two-way speech-to-speech translation system.