L. Ferrer, E. Shriberg, S. Kajarekar and K. Sonmez, “Parameterization of Prosodic Feature Distributions for SVM Modeling in Speaker Recognition,” 2007 IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP ’07, 2007, pp. IV-233-IV-236, doi: 10.1109/ICASSP.2007.366892.
Abstract
Multiple recent studies have shown that speaker recognition performance using frame-based cepstral features is improved by adding higher-level information, including prosodic and lexical features. This paper explores the important question of finding a good kernel for a system that models syllable-based prosodic features using support vector machines (SVMs). The system has been the best performing of our high-level systems in the last t wo NIST evaluations, and gives significant improvements when combined with cepstral-based systems. We introduce two new methods for transforming the syllable-level features into a single high-dimensional vector that can be well modeled by SVMs, resulting in significant gains in speaker recognition performance.