• Skip to primary navigation
  • Skip to main content
SRI InternationalSRI mobile logo

SRI International

SRI International - American Nonprofit Research Institute

  • About
    • Blog
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Show Search
Hide Search
Home » Archives for Andreas Tsiartas

Andreas Tsiartas

SRI Author

  • Andreas Tsiartas

    Research Engineer, Speech Technology and Research Laboratory

    View all posts

Information & computer science publications April 17, 2020 Article

Speech‐based markers for posttraumatic stress disorder in US veterans

Andreas Tsiartas, Colleen Richey, Jennifer Smith, Bruce Knoth, Dimitra Vergyri April 17, 2020

Background: The diagnosis of posttraumatic stress disorder (PTSD) is usually based on clinical interviews or self‐report measures. Both approaches are subject to underand over‐reporting of symptoms. An objective test is lacking. We have developed a classifier of PTSD based on objective speech‐marker features that discriminate PTSD cases from controls.
Methods: Speech samples were obtained from warzone‐exposed veterans, 52 cases with PTSD and 77 controls, assessed with the Clinician‐Administered PTSD Scale. Individuals with major depressive disorder (MDD) were excluded. Audio recordings of clinical interviews were used to obtain 40,526 speech features which were input to a random forest (RF) algorithm.
Results: The selected RF used 18 speech features and the receiver operating characteristic curve had an area under the curve (AUC) of 0.954. At a probability of PTSD cut point of 0.423, Youden’s index was 0.787, and overall correct classification rate was 89.1%. The probability of PTSD was higher for markers that indicated slower, more monotonous speech, less change in tonality, and less activation. Depression symptoms, alcohol use disorder, and TBI did not meet statistical tests to be considered confounders.
Conclusions: This study demonstrates that a speech‐based algorithm can objectively differentiate PTSD cases from controls. The RF classifier had a high AUC. Further validation in an independent sample and appraisal of the classifier to identify those with MDD only compared with those with PTSD comorbid with MDD is required.

Education & learning publications June 1, 2019 Whitepaper

Mapping Individual to Group Level Collaboration Indicators Using Speech Data

Jennifer Smith, Nonye M. Alozie, Andreas Tsiartas, Colleen Richey, Harry Bratt June 1, 2019

Automatic detection of collaboration quality from the students’ speech could support teachers in monitoring group dynamics, diagnosing issues, and developing pedagogical intervention plans. To address the challenge of mapping characteristics of individuals’ speech to information about the group, we coded behavioral and learning-related indicators of collaboration at the individual level. In this work, we investigate the feasibility of predicting the quality of collaboration among a group of students working together to solve a math problem from human-labelled collaboration indicators. We use a corpus of 6th, 7th, and 8th grade students working in groups of three to solve math problems collaboratively. Researchers labelled both the group-level collaboration quality during each problem and the student-level collaboration indicators. Results using random forests reveal that the individual indicators of collaboration aid in the prediction of group collaboration quality.

School and district reform publications April 1, 2018 Conference Proceeding

Crowdsourcing Emotional Speech

Jennifer Smith, Andreas Tsiartas April 1, 2018

We describe the methodology for the collection and annotation of a large corpus of emotional speech data through crowdsourcing. The corpus offers 187 hours of data from 2,965 subjects. Data includes non-emotional recordings from each subject as well as recordings for five emotions: angry, happy-low-arousal, happy-high-arousal, neutral, and sad. The data consist of spontaneous speech elicited from subjects via a web-based tool. Subjects used their own personal recording equipment, resulting in a data set that contains variation in room acoustics, microphone, etc. This offers the advantage of matching the type of variation one would expect to see when exposing speech technology in the wild in a web-based environment. The annotation scheme covers the quality of emotion expressed through the tone of voice and what was said, along with common audioquality issues. We discuss lessons learned in the process of the creation of this corpus.

Speech & natural language publications August 1, 2017 Conference Paper

Inferring Stance from Prosody

Diego Castán, Andreas Tsiartas August 1, 2017

Speech conveys many things beyond content, including aspects of stance and attitude that have not been much studied. Considering 14 aspects of stance as they occur in radio news stories, we investigated the extent to which they could be inferred from prosody.  By using time-spread prosodic features and by aggregating local estimates, many aspects of stance were at least somewhat predictable, with results significantly better than chance for many stance aspects, including, across English, Mandarin and Turkish, good, typical, local, background, new information, and relevant to a large group.

Speech & natural language publications March 1, 2017 Conference Proceeding

Analysis and prediction of heart rate using speech features from natural speech

Jennifer Smith, Andreas Tsiartas, Andreas Kathol, Massimiliano de Zambotti March 1, 2017

Interactive voice technologies can leverage biosignals, such as heart rate (HR), to infer the psychophysiological state of the user. Voice-based detection of HR is attractive because it does not require additional sensors. We predict HR from speech using the SRI BioFrustration Corpus. In contrast to previous studies we use continuous spontaneous speech as input. Results using random forests show modest but significant effects on HR prediction. We further explore the effects on HR of speaking itself, and contrast the effects when interactions induce neutral versus frustrated responses from users. Results reveal that regardless of the user’s emotional state, HR tends to increase while the user is engaged in speaking to a dialog system relative to a silent region right before speech, and that this effect is greater when the subject is expressing frustration. We also find that the user’s HR does not recover to pre-speaking levels as quickly after frustrated speech as it does after neutral speech. Implications and future directions are discussed.

Speech & natural language publications June 1, 2016 Conference Paper

Noise and reverberation effects on depression detection from speech

Andreas Tsiartas June 1, 2016

Speech-based depression detection has gained importance in recent years, but most research has used relatively quiet conditions or examined a single corpus per study. Little is thus known about the robustness of speech cues in the wild. This study compares the effect of noise and reverberation on depression prediction using 1) standard mel-frequency cepstral coefficients (MFCCs), and 2) features designed for noise robustness, damped oscillator cepstral coefficients (DOCCs). Data come from the 2014 Audio-Visual Emotion Recognition Challenge (AVEC). Results using additive noise and reverberation reveal a consistent pattern of findings for multiple evaluation metrics under both matched and mismatched conditions. First and most notably: standard MFCC features suffer dramatically under test/train mismatch for both noise and reverberation; DOCC features are far more robust. Second, including higher-order cepstral coefficients is generally beneficial. Third, artificial neural networks tend to outperform support vector regression. Fourth, spontaneous speech appears to offer better robustness than read speech. Finally, a cross-corpus (and crosslanguage) experiment reveals better noise and reverberation robustness for DOCCs than for MFCCs. Implications and future directions for real-world robust depression detection are discussed.

Education & learning publications June 1, 2016 Conference Paper

Spoken Interaction Modeling for Automatic Assessment of Collaborative Learning

Jennifer Smith, Harry Bratt, Colleen Richey, Andreas Tsiartas, Nonye M. Alozie June 1, 2016

Collaborative learning is a key skill for student success, but simultaneous monitoring of multiple small groups is untenable for teachers. This study investigates whether automatic audio- based monitoring of interactions can predict collaboration quality. Data consist of hand-labeled 30-second segments from audio recordings of students as they collaborated on solving math problems. Two types of features were explored: speech activity features, which were computed at the group level; and prosodic features (pitch, energy, durational, and voice quality patterns), which were computed at the speaker level. For both feature types, normalized and unnormalized versions were investigated; the latter facilitate real-time processing applications. Results using boosting classifiers, evaluated by F-measure and accuracy, reveal that (1) both speech activity and prosody features predict quality far beyond chance using majority-class approach; (2) speech activity features are the better predictors overall, but class performance using prosody shows potential synergies; and (3) it may not be necessary to session-normalize features by speaker. These novel results have impact for educational settings, where the approach could support teachers in the monitoring of group dynamics, diagnosis of issues, and development of pedagogical intervention plans.

Speech & natural language publications September 1, 2015 Conference Paper

Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system

Andreas Tsiartas, Andreas Kathol, Massimiliano de Zambotti September 1, 2015

Most research on detecting a speaker’s cognitive state when interacting with a dialog system has been based on selfreports, or on hand-coded subjective judgments based on audio or audio-visual observations.  This study examines two questions: (1) how do undesirable system responses affect people physiologically, and (2) to what extent can we predict physiological changes from the speech signal alone?  To address these questions, we use a new corpus of simultaneous speech and high-quality physiological recordings in the product returns domain (the SRI BioFrustration Corpus).  “Triggers” were used to frustrate users at specific times during the interaction to produce emotional responses at similar times during the experiment across participants.  For each of eight return tasks per participant, we compared speaker-normalized pre-trigger (cooperative system behavior) regions to posttrigger (uncooperative system behavior) regions.  Results using random forest classifiers show that changes in spectral and
temporal features of speech can predict heart rate changes with an accuracy of ~70%.  Implications for future research and applications are discussed.

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs
Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Blog

Institute

Leadership

Press room

Media inquiries

Compliance

Privacy policy

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter

日本支社

SRI International

  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International