Core technologies and applications
SRI’s Speech Technology and Research (STAR) Laboratory brings together a multidisciplinary mix of engineers, computer scientists and linguists. Together our experts build systems for a wide range of applications including signal processing; data indexing and mining; and computer-aided learning.
Speech recognition
Noise robustness
Speech production and perception-based features
Keyword spotting
Prosodic modeling and disfluencies
Speech & audio analytics
Voice biometrics
Language/accent identification
Speaker and speaker-state characterization
Audio event detection
Speaker diarization
Machine translation
Speech-to-Speech translation
Cross-lingual information retrieval
Machine-mediated cross-lingual communication
Natural language understanding
Human-computer interaction
Dialog systems and virtual personal assistants (VPAs)
Error detection and recovery
Semantic and syntactic parsing
Information extraction
Multi-lingual information extraction
Topic and event identification
Summarization;
Question answering
Our work
more +-
Nuance Partners with SCIENTIA Puerto Rico
SRI spin-out Nuance Communications to expand access its Dragon Medical One for the island’s physicians and nurses
-
AI-based speech sentiment analysis technology
Enabling companies to automatically understand the intonation of the human voice.
-
Aaron Lawson talks about the STAR Lab at SRI
Aaron Lawson is Assistant Lab Director at SRI’s Speech Technology and Research (STAR) lab. STAR lab brings together a multidisciplinary mix of engineers, computer scientists and linguists. Together their experts build systems for a wide range of applications including signal processing; data indexing and mining; and computer-aided learning. Join us to learn about how STAR…
Speech and Natural language leadership
Featured researchers
Platforms

Open Language Interface for Voice Exploitation (OLIVE)
Novel speech processing technology leverages AI algorithms to enable speech activity detection in high levels of noise and distortion.

SenSay
Real-time speaker state platform estimates speaker state—such as emotion, sentiment, cognition, health, mental health and communication quality—in a range of end applications.

DynaSpeak® speech recognition engine
Small-footprint, high-accuracy engine incorporates patented techniques that increase recognition performance using speaker adaptation, microphone adaptation, end-of- speech detection, distributed speech recognition and noise robustness.

EduSpeak® speech recognition toolkit
Toolkit specifically designed for language-learning applications and other educational and training software. Works for both adult and child voices, it excels at recognizing native and non-native speakers.

SRI Language Modeling (SRILM)
Toolkit helps build and apply statistical language models for speech recognition, statistical tagging and segmentation, and machine translation. Can be downloaded and used free of charge.
Publications
more +-
Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option
In this work, we extend the TBC method, proposing a new similarity metric for selecting training data that results in significant gains over the one proposed in the original work.
-
Resilient Data Augmentation Approaches to Multimodal Verification in the News Domain
Building on multimodal embedding techniques, we show that data augmentation via two distinct approaches improves results: entity linking and cross-domain local similarity scaling.
-
Natural Language Access: When Reasoning Makes Sense
We argue that to use natural language effectively, we must have both a deep understanding of the subject domain and a general-purpose reasoning capability.
-
Wideband Spectral Monitoring Using Deep Learning
We present a system to perform spectral monitoring of a wide band of 666.5 MHz, located within a range of 6 GHz of Radio Frequency (RF) bandwidth, using state-of-the-art deep learning approaches.
-
Dual orexin and MCH neuron-ablated mice display severe sleep attacks and cataplexy
These results indicate a functional interaction between orexin and MCH neurons in vivo that suggests the synergistic involvement of these neuronal populations in the sleep/wakefulness cycle.
-
Mapping Individual to Group Level Collaboration Indicators Using Speech Data
To address the challenge of mapping characteristics of individuals’ speech to information about the group, we coded behavioral and learning-related indicators of collaboration at the individual level.
-
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
This article focuses on speaker recognition using speech acquired using a single distant or far-field microphone in an indoors environment.
-
Analysis of Complementary Information Sources in the Speaker Embeddings Framework
In this study, our aim is analyzing the behavior of the speaker recognition systems based on speaker embeddings toward different front-end features, including the standard MFCC, as well as PNCC, and PLP.
-
Structure-based lead optimization to improve antiviral potency and ADMET properties of phenyl-1H-pyrrole-carboxamide entry inhibitors targeted to HIV-1 gp120
We are continuing our concerted effort to optimize our first lead entry antagonist, NBD-11021, which targets the Phe43 cavity of the HIV-1 envelope glycoprotein gp120, to improve antiviral potency and ADMET properties.