• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Home » Archives for Andreas Kathol
Andreas Kathol

Andreas Kathol

Research Linguist, Speech Technology and Research Laboratory
Go to bio page

Publications

Speech & natural language publications March 1, 2017

Analysis and prediction of heart rate using speech features from natural speech

Jennifer Smith, Andreas Tsiartas, Andreas Kathol, Massimiliano de Zambotti

We predict HR from speech using the SRI BioFrustration Corpus.In contrast to previous studies we use continuous spontaneous speech as input.

Speech & natural language publications March 1, 2017 Conference Proceeding

Toward human-assisted lexical unit discovery without text resources

Andreas Kathol, Dimitra Vergyri, Harry Bratt

This work addresses lexical unit discovery for languages without (usable) written resources. Previous work has addressed this problem using entirely unsupervised methodologies.  Our approach in contrast investigates the use of linguistic and speaker knowledge which are often available even if text resources are not.  We create a framework that benefits from such resources, not assuming orthographic representations and avoiding generation of word-level transcriptions.  We adapt a universal phone recognizer to the target language and use it to convert audio into a searchable phone string for lexical unit discovery via fuzzy sub-string matching.  Linguistic knowledge is used to constrain phone recognition output and to constrain lexical unit discovery on the phone recognizer output.
Target language speakers are used to assist a linguist in creating phonetic transcriptions for the adaptation of acoustic and language models, by respeaking more clearly a small portion of the target language audio.  We also explore robust features and feature transform through deep auto-encoders for better phone recognition performance.
The proposed approach achieves lexical unit discovery performance comparable to state-of-the-art zero-resource methods.  Since the system is built on phonetic recognition, discovered units are immediately interpretable.  They can be used to automatically populate a pronunciation lexicon and enable iterative improvement through additional feedback from target language speakers.

Speech & natural language publications September 1, 2016 Conference Paper

Automatic Speech Transcription for Low-Resource Languages — The Case of Yoloxóchitl Mixtec (Mexico)

Andreas Kathol

The rate at which endangered languages can be documented has been highly constrained by human factors. Although digital recording of natural speech in endangered languages may proceed at a fairly robust pace, transcription of this material is not only time consuming but severely limited by the lack of native-speaker personnel proficient in the orthography of their mother tongue. Our NSF-funded project in the Documenting Endangered Languages (DEL) program proposes to tackle this problem from two sides: first via a tool that helps native speakers become proficient in the orthographic conventions of their language, and second by using automatic speech recognition (ASR) output that assists in the transcription effort for newly recorded audio data. In the present study, we focus exclusively on progress in developing speech recognition for the language of interest, Yoloxóchitl Mixtec (YM), an Oto-Manguean language spoken by fewer than 5000 speakers on the Pacific coast of Guerrero, Mexico. In particular, we present results from an initial set of experiments and discuss future directions through which better and more robust acoustic models for endangered languages with limited resources can be created.

Speech & natural language publications September 1, 2016 Conference Paper

The SRI CLEO Speaker-State Corpus

Andreas Kathol, Massimiliano de Zambotti

We introduce the SRI CLEO (Conversational Language about Everyday Objects) Speaker-State Corpus of speech, video, and biosignals.

Speech & natural language publications September 1, 2015

Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system

Andreas Kathol, Andreas Tsiartas, Massimiliano de Zambotti

This study examines two questions: how do undesirable system responses affect people physiologically, and to what extent can we predict physiological changes from the speech signal alone? 

Speech & natural language publications March 1, 2015 Conference Paper

The SRI biofrustration corpus: Audio, video and physiological signals for continuous user modeling

Andreas Kathol

We describe the SRI BioFrustration Corpus, an inprogress corpus of time-aligned audio, video, and autonomic nervous system signals recorded while users interact with a dialog system to make returns of faulty consumer items.

Speech & natural language publications November 1, 2014 Conference Paper

The SRI AVEC-2014 Evaluation System

Martin Graciarena, Dimitra Vergyri, Colleen Richey, Andreas Kathol

We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale. 

Speech & natural language publications May 1, 2014 Conference Paper

Robust Features and System Fusion for Reverberation-robust Speech Recognition

Andreas Kathol

Reverberation in speech degrades the performance of speech recognition systems, leading to higher word error rates. Human listeners can often ignore reverberation, indicating that the auditory system somehow compensates for reverberation degradations. In this work, we present robust acoustic features motivated by the knowledge gained from human speech perception and production, and we demonstrate that these features provide reasonable robustness to reverberation effects compared to traditional mel-filterbank-based features. Using a single-feature system trained with the data distributed through the REVERB 2014 challenge on automatic speech recognition, we show a modest 12 pct. and 0.2 pct. relative reduction in word error rate (WER) compared to the mel-scale-feature-based […]

Speech & natural language publications August 1, 2013 Conference Paper

Strategies for high accuracy keyword detection in noisy channels

Andreas Kathol, Dimitra Vergyri, Horacio Franco, Martin Graciarena

We present design strategies for a keyword spotting (KWS) system that operates in highly degraded channel conditions with very low signal-to-noise ratio levels.

  • Go to page 1
  • Go to page 2
  • Go to Next Page »

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International