The human voice is a powerful tool. SRI’s speech and language technologies not only allow us to interact more naturally with computing applications—they also provide a wealth of actionable information about our intentions, health and emotional state.
Aaron Lawson is Assistant Lab Director at SRI’s Speech Technology and Research (STAR) lab. STAR lab brings together a multidisciplinary mix of engineers, computer scientists and linguists. Together their experts build systems for a wide range of applications including signal processing; data indexing and mining; and computer-aided learning. Join us to learn about how STAR […]
Core technologies and applications
SRI’s Speech Technology and Research (STAR) Laboratory brings together a multidisciplinary mix of engineers, computer scientists and linguists. Together our experts build systems for a wide range of applications including signal processing; data indexing and mining; and computer-aided learning.
Speech production and perception-based features
Prosodic modeling and disfluencies
Speech & audio analytics
Speaker and speaker-state characterization
Audio event detection
Cross-lingual information retrieval
Machine-mediated cross-lingual communication
Natural language understanding
Dialog systems and virtual personal assistants (VPAs)
Error detection and recovery
Semantic and syntactic parsing
Multi-lingual information extraction
Topic and event identification
Novel speech processing technology leverages AI algorithms to enable speech activity detection in high levels of noise and distortion.
With the advent of generative adversarial networks and misinformation in social media, there has been increased interest in multimodal verification. Image-text verification typically involves determining whether a caption and an image correspond with each other. Building on multimodal embedding techniques, we show that data augmentation via two distinct approaches improves results: entity linking and cross-domain local similarity scaling. We refer to the approaches as resilient because we show state-of-the-art results against manipulations specifically designed to thwart the exact multimodal embeddings we are using as the basis for all of our features.
Natural language is one of the more appealing ways by which people can interact with computers, but up to now its application has been severely constrained. We argue that to use natural language effectively, we must have both a deep understanding of the subject domain and a general-purpose reasoning capability. We illustrate the issues with SAP-QUEST, a proof-of-concept system for natural language question answering over a set of data sources for business enterprise applications, but the argument can be applied more generally to dialogue-style interfaces over a variety of subject domains.
We present a system to perform spectral monitoring of a wide band of 666.5 MHz, located within a range of 6 GHz of Radio Frequency (RF) bandwidth, using state-of-the-art deep learning approaches.
“I love to work here because SRI International is a place where you find people smarter and nicer than you. They keep me pushing in the right direction and challenging me every day, but with respect and kindness. What else can you ask for?”
Advanced Computer Scientist, Information & Computer Science Division