• Skip to primary navigation
  • Skip to main content
SRI InternationalSRI mobile logo

SRI International

SRI International - American Nonprofit Research Institute

  • About
    • Blog
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Show Search
Hide Search
Speech & natural language publications May 1, 2013 Conference Paper

Rich system combination for keyword spotting in noisy and acoustically heterogeneous audio streams

SRI International May 1, 2013

Citation

Copy to clipboard


M. Akbacak, L. Burget, W. Wang, and J. van Hout, “Rich system combination for keyword spotting  in noisy and acoustically heterogenous audio streams,” in Proc. 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 8267–8271.

Abstract

We address the problem of retrieving spoken information from noisy and heterogeneous audio archives using a rich system combination with a diverse set of noise-robust modules and audio characterization. Audio search applications so far have focused on constrained domains or genres and not-so-noisy and heterogeneous acoustic or channel conditions. In this paper, our focus is to improve the accuracy of a keyword spotting spotting system in a highly degraded and diverse channel conditions by employing multiple recognition systems in parallel with different robust frontends and modeling choices, as well as different representations during audio indexing and search (words vs. subword units). Then, after aligning keyword hits from different systems, we employ system combination at the score level using a logistic-regression-based classifier. When available, side information (such as signal-to-noise ratio or the output of an acoustic condition identification module) is used to guide system combination that is trained on separate held-out data. Lattice-based indexing and search is used in all keyword spotting systems.  We present improvements in probability-miss at a fixed probability-false-alarm by employing our proposed rich system combination approach on DARPA Robust Audio Transcription (RATS) Phase-I evaluation data that contains highly degraded channel recordings (SNR as low as 0 dB) and different channel characteristics.

↓ Download

Share this

Facebooktwitterlinkedinmail

Publication, Speech & natural language publications Conference Paper

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs
Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Blog

Institute

Leadership

Press room

Media inquiries

Compliance

Privacy policy

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter

日本支社

SRI International

  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International