• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Home » Archives for Martin Graciarena
Martin Graciarena

Martin Graciarena

Technical Manager, Speech Technology and Research Laboratory
Go to bio page

Publications

Publication October 1, 2021

Resilient Data Augmentation Approaches to Multimodal Verification in the News Domain

Martin Graciarena

Building on multimodal embedding techniques, we show that data augmentation via two distinct approaches improves results: entity linking and cross-domain local similarity scaling.

Speech & natural language publications July 22, 2020 Conference Paper

Wideband Spectral Monitoring Using Deep Learning

Horacio Franco, Martin Graciarena

We present a system to perform spectral monitoring of a wide band of 666.5 MHz, located within a range of 6 GHz of Radio Frequency (RF) bandwidth, using state-of-the-art deep learning approaches.

Speech & natural language publications September 1, 2018 Conference Proceeding

Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings

Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson, Martin Graciarena

This article focuses on speaker recognition using speech acquired using a single distant or far-field microphone in an indoors environment. This study differs from the majority of speaker recognition research, which focuses on speech acquisition over short distances, such as when using a telephone handset or mobile device or far-field microphone arrays, for which beamforming can enhance distant speech signals. We use two large-scale corpora collected by retransmitting speech data in reverberant environments with multiple microphones placed at different distances. We first characterize three different speaker recognition systems ranging from a traditional universal background model (UBM) i-vector system to a state-of-the-art deep neural network (DNN) speaker embedding system with a probabilistic linear discriminant analysis (PLDA) back-end. We then assess the impact of microphone distance and placement, background noise, and loudspeaker orientation on the performance of speaker recognition system for distant speech data. We observe that the recently introduced DNN speaker embedding based systems are far more robust compared to i-vector based systems, providing a significant relative improvement of up to 54% over the baseline UBM i-vector system, and 45.5% over prior DNN-based speaker recognition technology.

Speech & natural language publications March 1, 2017 Conference Proceeding

Speech recognition in unseen and noisy channel conditions

Horacio Franco, Martin Graciarena, Dimitra Vergyri

Speech recognition in varying background conditions is a challenging problem. Acoustic condition mismatch between training and evaluation data can significantly reduce recognition performance. For mismatched conditions, data-adaptation techniques are typically found to be useful, as they expose the acoustic model to the new data condition(s). Supervised adaptation techniques usually provide substantial performance improvement, but such gain is contingent on having labeled or transcribed data, which is often unavailable. The alternative is unsupervised adaptation, where feature-transform methods and model-adaptation techniques are typically explored. This work investigates robust features, feature-space maximum likelihood linear regression (fMLLR) transform, and deep convolutional nets to address the problem of unseen channel and noise conditions. In addition, the work investigates bottleneck (BN) features extracted from deep autoencoder (DAE) networks trained by using acoustic features extracted from the speech signal. We demonstrate that such representations not only produce robust systems but also that they can be used to perform data selection for unsupervised model adaptation. Our results indicate that the techniques presented in this paper significantly improve performance of speech recognition systems in unseen channel and noise conditions.

Speech & natural language publications September 1, 2016 Conference Paper

Minimizing Annotation Effort for Adaptation of Speech-Activity Detection Systems

Martin Graciarena

This paper focuses on the problem of selecting the best-possible subset of available audio data given a budgeted time for annotation.

Speech & natural language publications September 1, 2016 Conference Paper

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation

Martin Graciarena

In this paper, we present the SRI system submission to the NIST OpenSAD 2015 speech activity detection (SAD) evaluation. We present results on three different development databases that we created from the provided data.

Speech & natural language publications June 1, 2016 Conference Paper

A Phonetically Aware System for Speech Activity Detection

Martin Graciarena

In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program.

Speech & natural language publications December 1, 2015 Conference Paper

Improving robustness against reverberation for automatic speech recognition

Mitchell McLaren, Martin Graciarena, Horacio Franco, Dimitra Vergyri

In this work, we explore the role of robust acoustic features motivated by human speech perception studies, for building ASR systems robust to reverberation effects.

Speech & natural language publications September 1, 2015 Conference Paper

Mitigating the effects of non-stationary unseen noises on language recognition performance

Aaron Lawson, Martin Graciarena, Mitchell McLaren

We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance. 

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Interim pages omitted …
  • Go to page 5
  • Go to Next Page »

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International