S. S. Kajarekar, L. Ferrer, A. Stolcke and E. Shriberg, “Voice-based speaker recognition combining acoustic and stylistic features,” in Advances in Biometrics: Sensors, Algorithms and Systems, Part 2. London, England: Springer London, 2008, pp. 183–201.
We present a survey of the state of the art in voice-based speaker identification research. We describe the general framework of a text-independent speaker verification system, and, as an example, SRI’s voice-based speaker recognition system. This system was ranked among the best-performing systems in NIST text-independent speaker recognition evaluations in the years 2004 and 2005. It consists of six subsystems and a neural network combiner. The subsystems are categorized into two groups: acoustics-based, or low level, and stylistic, or high level. Acoustic subsystems extract short-term spectral features that implicitly capture the anatomy of the vocal apparatus, such as the shape of the vocal tract and its variations. These features are known to be sensitive to microphone and channel variations, and various techniques are used to compensate for these variations.