• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Home » Archives for Colleen Richey
Colleen Richey

Colleen Richey

Research Linguist, Speech Technology and Research Laboratory
Go to bio page

Publications

Information & computer science publications April 17, 2020

Speech‐based markers for post traumatic stress disorder in US veterans

Andreas Tsiartas, Colleen Richey, Jennifer Smith, Bruce Knoth, Dimitra Vergyri

This study demonstrates that a speech-based algorithm can objectively differentiate PTSD cases from controls.

Speech & natural language publications June 1, 2019 Whitepaper

Mapping Individual to Group Level Collaboration Indicators Using Speech Data

Jennifer Smith, Nonye M. Alozie, Andreas Tsiartas, Colleen Richey, Harry Bratt

Automatic detection of collaboration quality from the students’ speech could support teachers in monitoring group dynamics, diagnosing issues, and developing pedagogical intervention plans.

Speech & natural language publications September 1, 2018 Conference Proceeding

Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings

Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson, Martin Graciarena

This article focuses on speaker recognition using speech acquired using a single distant or far-field microphone in an indoors environment. This study differs from the majority of speaker recognition research, which focuses on speech acquisition over short distances, such as when using a telephone handset or mobile device or far-field microphone arrays, for which beamforming can enhance distant speech signals. We use two large-scale corpora collected by retransmitting speech data in reverberant environments with multiple microphones placed at different distances. We first characterize three different speaker recognition systems ranging from a traditional universal background model (UBM) i-vector system to a state-of-the-art deep neural network (DNN) speaker embedding system with a probabilistic linear discriminant analysis (PLDA) back-end. We then assess the impact of microphone distance and placement, background noise, and loudspeaker orientation on the performance of speaker recognition system for distant speech data. We observe that the recently introduced DNN speaker embedding based systems are far more robust compared to i-vector based systems, providing a significant relative improvement of up to 54% over the baseline UBM i-vector system, and 45.5% over prior DNN-based speaker recognition technology.

Speech & natural language publications June 1, 2018 Conference Proceeding

Voices Obscured in Complex Environmental Settings (VOiCES) corpus

Colleen Richey, Horacio Franco, Aaron Lawson, Allen Stauffer

This paper introduces the Voices Obscured in Complex Environmental Settings (VOiCES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical approach to better represent realistic scenarios, is to convolve clean speech with noise and simulated room response for model training. Despite these efforts, model performance degrades when tested against uncurated speech in natural conditions. For this corpus, audio was recorded in furnished rooms with background noise played in conjunction with foreground speech selected from the LibriSpeech corpus. Multiple sessions were recorded in each room to accommodate for all foreground speech-background noise combinations. Audio was recorded using twelve microphones placed throughout the room, resulting in 120 hours of audio per microphone. This work is a multi-organizational effort led by SRI International and Lab41 with the intent to push forward state-of-the-art distant microphone approaches in signal processing and speech recognition.

Education & learning publications September 1, 2016 Article

The SRI speech-based collaborative learning corpus

Colleen Richey, Nonye M. Alozie, Harry Bratt

We introduce the SRI speech-based collaborative learning corpus, a novel collection designed for the investigation and measurement of how students collaborate together in small groups. This is a multi-speaker corpus containing high-quality audio recordings of middle school students working in groups of three to solve mathematical problems. Each student was recorded via a head-mounted noise-cancelling microphone. Each group was also recorded via a stereo microphone placed nearby. A total of 80 sessions were collected with the participation of 134 students. The average duration of a session was 20 minutes. All students spoke English; for some students, English was a second language. Sessions have been annotated with time stamps to indicate which mathematical problem the students were solving and which student was speaking. Sessions have also been hand annotated with common indicators of collaboration for each speaker (e.g., inviting others to contribute, planning) and the overall collaboration quality for each problem. The corpus will be useful to education researchers interested in collaborative learning and to speech researchers interested in children’s speech, speech analytics, and speech diarization. The corpus, both audio and annotation, will be made available to researchers.

Education & learning publications September 1, 2016 Tech Report

Privacy- preserving speech analytics for automatic assessment of student collaboration

Nonye M. Alozie, Jennifer Smith, Harry Bratt, Colleen Richey, Andreas Tsiartas

This work investigates whether nonlexical information from speech can automatically predict the quality of small-group collaborations. Audio was collected from students as they collaborated in groups of three to solve math problems. Experts in education hand-annotated 30-second time windows for collaboration quality. Speech activity features, computed at the group level, and spectral, temporal and prosodic features, extracted at the speaker level, were explored. Fusion on features was also performed after transforming the later ones from the speaker to the group level. Machine learning approaches using Support Vector Machines and Random Forests show that feature fusion yields the best classification performance. The corresponding unweighted average F1 measure on a 4-class prediction task ranges between 40% and 50%, much higher than chance (12%). Speech activity features alone are also strong
predictors of collaboration quality achieving an F1 measure that ranges between 35% and 43%. Spectral, temporal and prosodic features alone achieve the lowest classification performance, but still higher than chance, and exhibit considerable contribution to speech activity feature performance as validated by the fusion results. These novel findings illustrate that the approach under study seems promising for monitoring of group dynamics and attractive in many collaboration activity settings where privacy is desired.

Education & learning publications June 1, 2016 Conference Paper

Spoken Interaction Modeling for Automatic Assessment of Collaborative Learning

Jennifer Smith, Harry Bratt, Colleen Richey, Andreas Tsiartas, Nonye M. Alozie

Collaborative learning is a key skill for student success, but simultaneous monitoring of multiple small groups is untenable for teachers. This study investigates whether automatic audio- based monitoring of interactions can predict collaboration quality. Data consist of hand-labeled 30-second segments from audio recordings of students as they collaborated on solving math problems. Two types of features were explored: speech activity features, which were computed at the group level; and prosodic features (pitch, energy, durational, and voice quality patterns), which were computed at the speaker level. For both feature types, normalized and unnormalized versions were investigated; the latter facilitate real-time processing applications. Results using boosting classifiers, evaluated by F-measure and accuracy, reveal that (1) both speech activity and prosody features predict quality far beyond chance using majority-class approach; (2) speech activity features are the better predictors overall, but class performance using prosody shows potential synergies; and (3) it may not be necessary to session-normalize features by speaker. These novel results have impact for educational settings, where the approach could support teachers in the monitoring of group dynamics, diagnosis of issues, and development of pedagogical intervention plans.

Biomedical sciences publications May 1, 2015

Classification of Lexical Stress Using Spectral and Prosodic Features for Computer-assisted Language Learning Systems

Harry Bratt, Colleen Richey, Horacio Franco, Victor Abrash, Kristin Precoda

We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software.

Speech & natural language publications November 1, 2014 Conference Paper

The SRI AVEC-2014 Evaluation System

Martin Graciarena, Dimitra Vergyri, Colleen Richey, Andreas Kathol

We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale. 

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Go to Next Page »

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International