Author: Martin Graciarena

October 1, 2021

Resilient Data Augmentation Approaches to Multimodal Verification in the News Domain

Building on multimodal embedding techniques, we show that data augmentation via two distinct approaches improves results: entity linking and cross-domain local similarity scaling.
July 22, 2020

Wideband Spectral Monitoring Using Deep Learning

We present a system to perform spectral monitoring of a wide band of 666.5 MHz, located within a range of 6 GHz of Radio Frequency (RF) bandwidth, using state-of-the-art deep learning approaches.
September 1, 2018

Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings

This article focuses on speaker recognition using speech acquired using a single distant or far-field microphone in an indoors environment.
March 1, 2017

Speech recognition in unseen and noisy channel conditions

This work investigates robust features, feature-space maximum likelihood linear regression (fMLLR) transform, and deep convolutional nets to address the problem of unseen channel and noise conditions in speech recognition.
September 1, 2016

Minimizing Annotation Effort for Adaptation of Speech-Activity Detection Systems

This paper focuses on the problem of selecting the best-possible subset of available audio data given a budgeted time for annotation.
September 1, 2016

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation

In this paper, we present the SRI system submission to the NIST OpenSAD 2015 speech activity detection (SAD) evaluation. We present results on three different development databases that we created from the provided data.
June 1, 2016

A Phonetically Aware System for Speech Activity Detection

In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program.
December 1, 2015

Improving robustness against reverberation for automatic speech recognition

In this work, we explore the role of robust acoustic features motivated by human speech perception studies, for building ASR systems robust to reverberation effects.
September 1, 2015

Mitigating the effects of non-stationary unseen noises on language recognition performance

We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance.
April 1, 2015

Softsad: Integrated frame-based speech confidence for speaker recognition

In this paper we propose softSAD: the direct integration of speech posteriors into a speaker recognition system instead of using speech activity detection (SAD).
November 1, 2014

The SRI AVEC-2014 Evaluation System

We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale.
September 1, 2014

Evaluating Robust Features on Deep Neural Networks for Speech Recognition in Noisy and Channel Mismatched Conditions

In this work we present a study exploring both conventional DNNs and deep Convolutional Neural Networks (CNN) for noise- and channel-degraded speech recognition tasks using the Aurora4 dataset.