• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
    • Our history
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Speech & natural language publications September 1, 1997

A Lognormal Tied Mixture Model of Pitch for Prosody-Based Speaker Recognition

Citation

Copy to clipboard


Mitchel, M. K. S. L. H., & Shriberg, W. E. A LOGNORMAL TIED MIXTURE MODEL OF PITCH FOR PROSODY-BASED SPEAKER RECOGNITION. parameters, 4, 6.

Abstract

Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that clean pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the “one-session” condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1% miss rate and 11% reduction in false alarm rate at 10% miss rate over the cepstrum-only system.

↓ Download

Share this

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International