• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
    • Our history
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Speech & natural language publications September 1, 2016

Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen Nets

Dimitra Vergyri, Horacio Franco

Citation

Copy to clipboard


V. Mitra, D. Vergyri, H. Franco, “Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen Nets,” in Proc. INTERSPEECH 2016, pp. 1300-1304, September 2016.

Abstract

Often, prior knowledge of subword units is unavailable for low-resource languages. Instead, a global subword unit description, such as a universal phone set, is typically used in such scenarios. One major bottleneck for existing speech-processing systems is their reliance on transcriptions. Unfortunately, the preponderance of data becoming available everyday is only worsening the problem, as properly transcribing, and hence making this data useful for training speech-processing models, is impossible. This work investigates learning acoustic units in an unsupervised manner from real-world speech data by using a cascade of an autoencoder and a Kohonen net. For this purpose, a deep autoencoder with a bottleneck layer at the center was trained with multiple languages. Once trained, the bottleneck-layer output was used to train a Kohonen net, such that state-level ids can be assigned to the bottleneck outputs. To ascertain how consistent such state-level ids are with respect to the acoustic units, phone-alignment information was used for a part of the data to qualify if indeed a functional relationship existed between the phone ids and the Kohonen state ids and, if yes, whether such relationship can be generalized to data that are not transcribed.

↓ Download

Share this

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International