Konig, Y., Heck, L., Weintraub, M., & Sonmez, K. (1998, April). Nonlinear discriminant feature extraction for robust text-independent speaker recognition. In Proc. RLA2C, ESCA workshop on Speaker Recognition and its Commercial and Forensic Applications (pp. 72-75).
We study a nonlinear discriminant analysis (NLDA) technique that extracts a speaker-discriminant feature set. Our approach is to train a multilayer perceptron (MLP) to maximize the separation between speakers by nonlinearly projecting a large set of acoustic features (e.g., several frames) to a lower-dimensional feature set. The extracted features are optimized to discriminante between speakers and to be robust to mismatched training and testing conditions. We train the MLP on a development set and apply it to the training and testing utterances. Our results show that by combining the NLDA-based system with a state of the art cepstrum-based system we improve the speaker verification performance on the 1997 NIST Speaker Recognition Evaluation set by 15 percent in average compared with our cepstrum-only system.