• Skip to primary navigation
  • Skip to main content
SRI InternationalSRI mobile logo

SRI International

SRI International - American Nonprofit Research Institute

  • About
    • Blog
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Show Search
Hide Search
Speech & natural language publications June 1, 2018 Conference Proceeding

Voices Obscured in Complex Environmental Settings (VOiCES) corpus

Colleen Richey, Horacio Franco, Aaron Lawson, Allen Stauffer June 1, 2018

SRI Authors: Colleen Richey, Horacio Franco, Aaron Lawson, Allen Stauffer

Citation

Copy to clipboard


C. Richey, M. A. Barrios, Z. Armstrong, C. Bartels, H. Franco, M. Graciarena, A. Lawson, M. K. Nandwana, A. Stauffer, J. van Hout, P. Gamble, J. Hetherly, C. Stephenson and K. Ni, “Voices obscured in complex environmental settings (VOiCES) corpus,” Interspeech 2018, Hyderabad, Telangana, India. Forthcoming September 2018.

Abstract

This paper introduces the Voices Obscured in Complex Environmental Settings (VOiCES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical approach to better represent realistic scenarios, is to convolve clean speech with noise and simulated room response for model training. Despite these efforts, model performance degrades when tested against uncurated speech in natural conditions. For this corpus, audio was recorded in furnished rooms with background noise played in conjunction with foreground speech selected from the LibriSpeech corpus. Multiple sessions were recorded in each room to accommodate for all foreground speech-background noise combinations. Audio was recorded using twelve microphones placed throughout the room, resulting in 120 hours of audio per microphone. This work is a multi-organizational effort led by SRI International and Lab41 with the intent to push forward state-of-the-art distant microphone approaches in signal processing and speech recognition.

↓ Download

Share this

Facebooktwitterlinkedinmail

Publication, Speech & natural language publications Conference Proceeding

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs
Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Blog

Institute

Leadership

Press room

Media inquiries

Compliance

Privacy policy

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter

日本支社

SRI International

  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International