Calibration and Multiple System Fusion for Spoken Term Detection Using Linear Logistic Regression


van Hout, J., Ferrer, L., Vergyri, D., Scheffer, N., Lei, Y., Mitra, V., & Wegmann, S. (2014, May). Calibration and multiple system fusion for spoken term detection using linear logistic regression. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7138-7142). IEEE.


State-of-the-art calibration and fusion approaches for spoken term detection (STD) systems currently rely on a multi-pass approach where the scores are calibrated, then fused, and finally re-calibrated to obtain a single decision threshold across keywords.  While the above techniques are theoretically correct, they rely on meta-parameter tuning and are prone to over-fitting.  This study presents an efficient and effective score calibration technique for keyword detection that is based on the logistic regression calibration approach commonly used in forensic speaker identification.  The technique applies seamlessly to both single systems and to system fusion, and enables optimization for specific keyword detection evaluation functions.  We run experiments on a Vietnamese STD task, comparing the technique with more empirical calibration and fusion schemes and demonstrate that we can achieve comparable or better performance in terms of the NIST ATWV metric with a more elegant solution.

Read more from SRI