• Skip to primary navigation
  • Skip to main content
SRI InternationalSRI mobile logo

SRI International

SRI International - American Nonprofit Research Institute

  • About
    • Blog
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Show Search
Hide Search
Home » Publication » Speech & natural language publications

Speech & natural language publications

Publication October 1, 2021 Conference Paper

Resilient Data Augmentation Approaches to Multimodal Verification in the News Domain

Martin Graciarena October 1, 2021

With the advent of generative adversarial networks and misinformation in social media, there has been increased interest in multimodal verification. Image-text verification typically involves determining whether a caption and an image correspond with each other. Building on multimodal embedding techniques, we show that data augmentation via two distinct approaches improves results: entity linking and cross-domain local similarity scaling. We refer to the approaches as resilient because we show state-of-the-art results against manipulations specifically designed to thwart the exact multimodal embeddings we are using as the basis for all of our features.

Speech & natural language publications July 27, 2021

Natural Language Access: When Reasoning Makes Sense

Richard J. Waldinger July 27, 2021

Natural language is one of the more appealing ways by which people can interact with computers, but up to now its application has been severely constrained. We argue that to use natural language effectively, we must have both a deep understanding of the subject domain and a general-purpose reasoning capability. We illustrate the issues with SAP-QUEST, a proof-of-concept system for natural language question answering over a set of data sources for business enterprise applications, but the argument can be applied more generally to dialogue-style interfaces over a variety of subject domains.

Speech & natural language publications July 22, 2020 Conference Paper

Wideband Spectral Monitoring Using Deep Learning

Horacio Franco, Martin Graciarena July 22, 2020

We present a system to perform spectral monitoring of a wide band of 666.5 MHz, located within a range of 6 GHz of Radio Frequency (RF) bandwidth, using state-of-the-art deep learning approaches. The system detects, labels, and localizes in time and frequency signals of interest (SOIs) against a background of wideband RF activity. We apply a hierarchical approach. At the lower level we use a sweeping window to analyze a wideband spectrogram, which is input to a deep convolutional network that estimates local probabilities for the presence of SOIs for each position of the window. In a subsequent, higher-level processing step, these local frame probability estimates are integrated over larger two-dimensional regions that are hypothesized by a second neural network, a region proposal network, adapted from object localization in image processing. The integrated segmental probability scores are used to detect SOIs in the hypothesized spectro-temporal regions.

Speech & natural language publications April 21, 2020 Article

Dual orexin and MCH neuron-ablated mice display severe sleep attacks and cataplexy

SRI International April 21, 2020

Orexin/hypocretin-producing and melanin-concentrating hormone-producing (MCH) neurons are co-extensive in the hypothalamus and project throughout the brain to regulate sleep/wakefulness.

Education & learning publications June 1, 2019 Whitepaper

Mapping Individual to Group Level Collaboration Indicators Using Speech Data

Jennifer Smith, Nonye M. Alozie, Andreas Tsiartas, Colleen Richey, Harry Bratt June 1, 2019

Automatic detection of collaboration quality from the students’ speech could support teachers in monitoring group dynamics, diagnosing issues, and developing pedagogical intervention plans. To address the challenge of mapping characteristics of individuals’ speech to information about the group, we coded behavioral and learning-related indicators of collaboration at the individual level. In this work, we investigate the feasibility of predicting the quality of collaboration among a group of students working together to solve a math problem from human-labelled collaboration indicators. We use a corpus of 6th, 7th, and 8th grade students working in groups of three to solve math problems collaboratively. Researchers labelled both the group-level collaboration quality during each problem and the student-level collaboration indicators. Results using random forests reveal that the individual indicators of collaboration aid in the prediction of group collaboration quality.

Education & learning publications March 5, 2019 Article

A state system framework for high-quality early intervention and early childhood special education

Grace Kelley, Kathleen M. Hebbeler March 5, 2019

The Early Childhood Technical Assistance Center used a rigorous 2-year collaborative process to develop, test, and revise a conceptual framework for high-quality state early intervention (EI) and early childhood special education (ECSE) systems. The framework identifies six critical components of a state system and what constitutes quality in each component. This new conceptual framework addresses the critical need to articulate what constitutes quality in state EI and ECSE systems. The framework and companion self-assessment are designed for state leaders to use in their efforts to evaluate and improve state systems to implement more effective services for infants and young children with disabilities and their families. This article describes the contents of the framework and the processes used to ensure that the framework incorporated current research, was relevant to all states, and was useful for systems improvement.

Speech & natural language publications January 1, 2019

Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option

Mitchell McLaren, Diego Castán, Aaron Lawson January 1, 2019

The output scores of most speaker recognition systems are not directly interpretable as stand-alone values.  For this reason, a calibration step is usually performed on the scores to convert them into proper likelihood ratios (LR), which have a clear probabilistic interpretation.  The standard calibration approach transforms the system scores using a linear function trained using data selected to closely match the evaluation conditions.  This selection, though, is not feasible when the evaluation conditions are unknown. In previous work, we proposed a calibration approach for this scenario called trialbased calibration (TBC).  TBC trains a separate calibration model for each test trial using data that is dynamically selected from a candidate training set to match the conditions of the trial.  In this work, we extend the TBC method, proposing (1) a new similarity metric for selecting training data that results in significant gains over the one proposed in the original work, (2) a new option that enables the system to reject a trial when not enough matched data is available for training the calibration model, and (3) the use of regularization to improve the robustness of the calibration models trained for each trial.  We test the proposed algorithms on a development set composed of several conditions and on the FBI multi-condition speaker recognition dataset, and we demonstrate that the proposed approach reduces calibration loss to values close to 0 for most conditions when matched calibration data is available for selection and that it can reject most trials for which relevant calibration data is unavailable.

Speech & natural language publications September 1, 2018 Conference Proceeding

Analysis of Complementary Information Sources in the Speaker Embeddings Framework

Mitchell McLaren, Diego Castán, Aaron Lawson September 1, 2018

Deep neural network (DNN)-based speaker embeddings have resulted in new, state-of-the-art text-independent speaker recognition technology. However, very limited effort has been made to understand DNN speaker embeddings. In this study, our aim is analyzing the behavior of the speaker recognition systems based on speaker embeddings toward different front-end features, including the standard Mel frequency cepstral coefficients (MFCC), as well as power normalized cepstral coefficients (PNCC), and perceptual linear prediction (PLP). Using a speaker recognition system based on DNN speaker embeddings and probabilistic linear discriminant analysis (PLDA), we compared different approaches to leveraging complementary information using score-, embeddings-, and feature-level combination. We report our results for Speakers in the Wild (SITW) and NIST SRE 2016 datasets. We found that first and second embeddings layers are complementary in nature. By applying score and embedding-level fusion we demonstrate relative improvements in equal error rate of 17% on NIST SRE 2016 and 10% on SITW over the baseline system.

Speech & natural language publications September 1, 2018 Conference Proceeding

Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings

Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson, Martin Graciarena September 1, 2018

This article focuses on speaker recognition using speech acquired using a single distant or far-field microphone in an indoors environment. This study differs from the majority of speaker recognition research, which focuses on speech acquisition over short distances, such as when using a telephone handset or mobile device or far-field microphone arrays, for which beamforming can enhance distant speech signals. We use two large-scale corpora collected by retransmitting speech data in reverberant environments with multiple microphones placed at different distances. We first characterize three different speaker recognition systems ranging from a traditional universal background model (UBM) i-vector system to a state-of-the-art deep neural network (DNN) speaker embedding system with a probabilistic linear discriminant analysis (PLDA) back-end. We then assess the impact of microphone distance and placement, background noise, and loudspeaker orientation on the performance of speaker recognition system for distant speech data. We observe that the recently introduced DNN speaker embedding based systems are far more robust compared to i-vector based systems, providing a significant relative improvement of up to 54% over the baseline UBM i-vector system, and 45.5% over prior DNN-based speaker recognition technology.

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Interim pages omitted …
  • Go to page 38
  • Go to Next Page »

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs
Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Blog

Institute

Leadership

Press room

Media inquiries

Compliance

Privacy policy

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter

日本支社

SRI International

  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International