S. S. Kajarekar, “Phone-based cepstral polynomial svm system for speaker recognition,” in Proc. 9th Annual Conference of the International Speech Communication Association 2008 (INTERSPEECH 2008), pp. 845–848.
We have been using a phone-based cepstral system with polynomial features in NIST evaluations for the past two years. This system uses three broad phone classes, three states per class, and third-order polynomial features obtained from MFCC features. In this paper, we present a complete analysis of the system. We start from a simpler system that does not use phones or states and show that the addition of phones gives a significant improvement. We show that adding state information does not provide improvement on its own but provides a significant improvement when used with phone classes. We complete the system by applying nuisance attribute projection (NAP) and score normalization. We show that splitting features after a joint NAP over all phone classes results in a significant improvement. Overall, we obtain about 25% performance improvement with polynomial features based on phones and states, and obtain a system with performance comparable to a state-of-the-art SVM system.
Index Terms: Speaker recognition, feature extraction, pattern recognition.