• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
    • Our history
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Speech & natural language publications September 1, 2014

Recent Improvements in SRI’s Keyword Detection System for Noisy Audio

Dimitra Vergyri, Horacio Franco, Martin Graciarena

Citation

Copy to clipboard


van Hout, J., Mitra, V., Lei, Y., Vergyri, D., Graciarena, M., Mandal, A., & Franco, H. (2014). Recent improvements in SRI’s keyword detection system for noisy audio. In INTERSPEECH (pp. 1727-1731).

Abstract

We present improvements to a keyword spotting (KWS) system that operates in highly adverse channel conditions with very low signal-to-noise ratio levels.  We employ a system combination approach by combining the outputs of multiple large vocabulary continuous speech recognition (LVCSR) systems.  These systems are complementary thanks to different design decisions across all levels of information:  three speech activity detections systems; a wide range of front-end signal processing features (standard cepstral and filter-bank features, noise-robust features and multi-layer perceptron features); three statistical acoustic model types (Gaussian mixtures models, deep and convolutional neural networks); two keyword search strategies (wordbased and phone-based). We explore the scenario where the keywords are known in advance by adding them to the language model and assigning higher weights to n-grams with keywords in them. The scores of each individual system are fused by a logistic-regression based classifier to produce the final system combination output.  We present the performance of our system in the Phase III evaluations of DARPAs Robust Automatic Transcription of Speech (RATS) program for Levantine Arabic and Farsi conversational speech corpora.

↓ Download

Share this

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International