Building on multimodal embedding techniques, we show that data augmentation via two distinct approaches improves results: entity linking and cross-domain local similarity scaling.
We present a system to perform spectral monitoring of a wide band of 666.5 MHz, located within a range of 6 GHz of Radio Frequency (RF) bandwidth, using state-of-the-art deep learning approaches.
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
This article focuses on speaker recognition using speech acquired using a single distant or far-field microphone in an indoors environment.
This work investigates robust features, feature-space maximum likelihood linear regression (fMLLR) transform, and deep convolutional nets to address the problem of unseen channel and noise conditions in speech recognition.
This paper focuses on the problem of selecting the best-possible subset of available audio data given a budgeted time for annotation.
In this paper, we present the SRI system submission to the NIST OpenSAD 2015 speech activity detection (SAD) evaluation. We present results on three different development databases that we created from the provided data.
In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program.
In this work, we explore the role of robust acoustic features motivated by human speech perception studies, for building ASR systems robust to reverberation effects.
We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance.