N. Scheffer and R. Vogt, “On the use of speaker superfactors for speaker recognition,” in Proc. 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4410–4413.
We propose a new method to characterize a speaker within the Joint Factor Analysis (JFA) framework. Scoring within the JFA framework can be costly and a new method was proposed to produce an accurate score in a fast manner. However, this method is nonsymmetric and performs badly without any score normalization. We propose a new JFA scoring method that is both symmetrical and efficient. In the same way as means of Gaussians can be concatenated to form a supervector, we use several estimates of speaker factors from the eigenvoice space to build a supervector of factors that we call superfactors. We motivate the use of such factors in the current JFA model through comparison with a Tied Factor Analysis model. We show that this method substantially improves the performance of a system that uses only the standard speaker factors to produce scores, and usually outperforms the baseline system. We also show that this method is relatively effective even when score normalization is not an option.
Index Terms: speaker recognition