• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
    • Our history
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Home » Archives for Mitchell McLaren » Page 3
Mitchell McLaren

Mitchell McLaren

Senior Computer Scientist, Speech Technology and Research Laboratory
Go to bio page

Publications

Speech & natural language publications April 1, 2015

Advances in deep neural network approaches to speaker recognition

Mitchell McLaren

In this work, we report the same achievement in DNN-based SID performance on microphone speech.  We consider two approaches to DNN-based SID:  one that uses the DNN to extract features, and another that uses the DNN during feature modeling.

Speech & natural language publications April 1, 2015

Improved speaker recognition using DCT coefficients as features

Mitchell McLaren

We recently proposed the use of coefficients extracted from the 2D discrete cosine transform (DCT) of log Mel filter bank energies to improve speaker recognition over the traditional Mel frequency cepstral coefficients (MFCC) with appended deltas and double deltas (MFCC/deltas). 

Speech & natural language publications September 1, 2014 Conference Paper

Application of Convolutional Neural Networks to Speaker Recognition in Noisy Conditions

Mitchell McLaren

This paper applies a convolutional neural network (CNN) trained for automatic speech recognition (ASR) to the task of speaker identification (SID).

Speech & natural language publications September 1, 2014 Conference Paper

Spoken Language Recognition Based on Senone Posteriors

Mitchell McLaren

This paper explores in depth a recently proposed approach to spoken language recognition based on the estimated posteriors for a set of senones representing the phonetic space of one or more languages.  A neural network (NN) is trained to estimate the posterior probabilities for the senones at a frame level.  A feature vector is then derived for every sample using these posteriors.  The effect of the language used in training the NN and the number of senones are studied. Speech-activity detection (SAD) and dimensionality reduction approaches are also explored and Gaussian and NN backends are compared.  Results are presented on heavily degraded speech data.  The proposed system is shown to give over 40% relative gain compared to a state-of-the-art language recognition system at sample durations from 3 to 120 seconds.

Speech & natural language publications September 1, 2014 Conference Paper

A Deep Neural Network Speaker Verification System Targeting Microphone Speech

Mitchell McLaren

We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. 

Speech & natural language publications June 1, 2014 Conference Paper

Trial-Based Calibration for Speaker Recognition in Unseen Conditions

Aaron Lawson, Mitchell McLaren

This work presents Trial-Based Calibration (TBC), a novel, automated calibration technique robust to both unseen and widely varying conditions.

Speech & natural language publications June 1, 2014 Conference Paper

Application of Convolutional Neural Networks to Language Identification in Noisy Conditions

Aaron Lawson, Mitchell McLaren

This paper proposes two novel frontends for robust language identification (LID) using a convolutional neural network (CNN) trained for automatic speech recognition (ASR).

Information & computer science publications May 1, 2014 Conference Paper

Simplified VTS-Based I-Vector Extraction in Noise-Robust Speaker Recognition

Mitchell McLaren

A vector taylor series (VTS) based i-vector extractor was recently proposed for noise-robust speaker recognition by extracting synthesized clean i-vectors to be used in the standard system back-end. This approach brings significant improvements in accuracy for noisy speech conditions. However, this approach incurred such a large computational expense that using the state-of-the-art model size or evaluating large scale evaluations was impractical. In this work, we propose an efficient simplification scheme, named sVTS, in order to show that the VTS approach gives improvements in large scale applications compared to state-of-the-art systems. In contrast to VTS, sVTS generates normalized Baum-Welch statistics and uses the standard i-vector model, making it straightforward to employ on the state-of-the-art i-vector speaker recognition system. Results presented on both the PRISM and the large NIST SRE’12 corpora show that using sVTS i-vectors provides significant improvements in the noisy conditions, and that our proposed simplification result in only a slight degradation with respect to the original VTS approach.

Speech & natural language publications May 1, 2014 Conference Paper

A Novel Scheme for Speaker Recognition Using a Phonetically-Aware Deep Neural Network

Mitchell McLaren

We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for automatic speech recognition (ASR). Specifically, the DNN replaces the standard Gaussian mixture model (GMM) to produce frame alignments.  The use of an ASR-DNN system in the speaker recognition pipeline is attractive as it integrates the information from speech content directly into the statistics, allowing the standard backends to remain unchanged. Improvement from the proposed framework compared to a state-of-the-art system are of 30% relative at the equal error rate when evaluated on the telephone conditions from the 2012 NIST speaker recognition evaluation (SRE). The proposed framework is a successful way to efficiently leverage transcribed data for speaker recognition, thus opening up a wide spectrum of research directions.

  • « Go to Previous Page
  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Go to page 4
  • Go to Next Page »

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International