• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
    • Our history
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Story April 26, 2020

75 Years of Innovation: Speech Recognition

Natural and automated speech recognition for wide-scale commercial application.

The 75 Years of Innovation series highlights the groundbreaking innovations spanning from SRI’s founding in 1946 to today. Each week, SRI will release an innovation, leading up to its 75th anniversary in November 2021.


SRI 75 Speech recognition

A Word in Your I/O Port: How Computers Were Made to DECIPHER Natural Human Speech

“What the Nuance speech system is actually doing is taking each sound apart and analyzing it to figure out exactly what the caller said…” Bob Morgen, SRI International, 1996

Computer World Interview on Charles Schwab & Co’s use of speech recognition in their VoiceBroker system.

Many creatures on planet earth communicate but human beings have made speech central to daily interactions. Unlike many of the planet’s fauna, human speech is made up of complex sentences and composed of grammatical rules that define context; this context places events in time and space. Add natural language ‘nuances’ to this, such as regional accents, and the complexity increases. Whilst these language foibles add great beauty and depth to the human language, they also make it difficult to use natural language when communicating with computers.

Scientist in lab coat talking into recording device

For many years, humans have been trying to translate human language into computer language, bringing the two worlds together; human speech being the ultimate in Human-Computer Interaction (HCI). Whilst there have been many attempts at connecting the computer with human beings using the spoken word, here at SRI, we created a system known as DECIPHER the basis of which is now used in commercial settings to make communicating with computers, via voice, a more natural and seamless experience.

This is the part that SRI played in allowing computers and humans to talk to one another.

The Technology Behind Natural Language Speech Recognition

Accents, in the context of speech recognition, can be frustrating for the user and challenging for the developer. Whilst there have been numerous attempts at employing speech recognition in Human-Computer Interaction, most have failed the accent test. SRI took the concepts behind natural language and speech and developed the DECIPHER project to meet this challenge head-on.

In 1989, the “DARPA speech and natural language workshop” was used as a platform to describe the technology behind DECIPHER. The SRI team who worked on the DECIPHER project, explored ways to integrate speech and linguistic knowledge using the HMM (Hidden Markov Model) framework.

A Markov Model is a method used to predict a sequence in a chain of random variables — the prediction is based on the current state. However, in reality, the events (variables) that make up a chain may be hidden. In the context of speech, this is often the ‘part-of-speech’ tags in a given text — we see the words, but the tags are hidden. A Hidden Markov Model (HMM), allows both the observed and hidden parts of speech to be deployed in an algorithm that can be used in a speech recognition program. In the 1980s, the HMM approach used by DECIPHER, revolutionized speech recognition, allowing computers to determine the probability that a sound was, in fact, a word, and to a high degree of accuracy.

The work at SRI International, which explored the use of HMM frameworks for natural language speech recognition, moved the technology on towards a more commercially viable use of speech recognition. This research has since been the foundation for a number of developments.

The Place of Natural Language Speech Recognition in the History of Technology

The Speech Technology and Research (STAR) Laboratory at SRI International began the journey that eventually resulted in a spin-off company, Corona Corporation (renamed, Nuance Communications). Nuance focused on commercializing advanced speech recognition technologies.

In 1995, The SRI Language Modeling Toolkit (SRILM) was developed. This provides the tools to build and apply statistical language models (LMs), primarily for use in speech recognition, statistical tagging and segmentation, and machine translation.

In terms of commercialization of natural automated speech recognition, SRI’s natural language speech recognition software was the first to be deployed by a major corporation. In 1996, Charles Schwab & Co., Inc., used Nuance’s speech recognition technology to allow customers to receive stock quotes over the telephone. One of the key features of the ‘Schwab Discount Brokerage system’, was the ability to recognize English words even when spoken by customers with accents.

In 1997, Nuance Communications developed the first large scale commercial dialog system for United Parcel Services (UPS). UPS used the voice recognition platform to handle very large numbers of inquiries about package status.

In 2006, Nuance, used the “The Amazing Race: Mobile Text Messaging” challenge, to pit their speech recognition technology against the world’s fastest texter: The texter took over 42 seconds whilst Nuance Mobile Dictation took 16 seconds.

Some of the most recent speech recognition technologies to be borne from SRI International are:

● EduSpeak, is used in foreign language teaching and corporate training and simulation. It can compare the language learner pronunciation with that of a native speaker. The system uses speaker-independent speech recognition engine that is designed for use by developers of interactive, multimedia learning products to integrate voice input in their products.

● DynaSpeak, is a small footprint, high accuracy speaker-independent speech recognition engine that scales from embedded to large scale system use in industrial, consumer, and military products and systems.

As human beings, we love to talk. SRI and spin-off Nuance have built the technology to allow our natural complex speech, even with accents, to be used to communicate with computers.

Now, the world of computing is no longer silent but filled with chatter.

Resources

STAR Lab: http://www.speech.sri.com/

Nuance Communications: https://www.nuance.com/

SRI’s DECIPHER System, Cohen, M., et. al., Speech Research Program, SRI Int., February 1989: https://www.researchgate.net/publication/234810357

Markov Chains, Stanford University text: https://web.stanford.edu/~jurafsky/slp3/A.pdf

Computerworld 14 Oct 1996, Schwab Dials Up Stock Quote System

Business Wire, Nuance Communications to Host Mobile Text Messaging Challenge, 2006: https://www.businesswire.com/news/home/20061016005419/en/Nuance-Communications-Host-Mobile-Text-Messaging-Challenge

Share this

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2023 SRI International
Manage Cookie Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage vendors Read more about these purposes
View preferences
{title} {title} {title}