Franco, H., Ferrer, L., & Bratt, H. (2014, May). Adaptive and discriminative modeling for improved mispronunciation detection. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7709-7713). IEEE.
In the context of computer-aided language learning, automatic detection of specific phone mispronunciations by nonnative speakers can be used to provide detailed feedback about specific pronunciation problems. In previous work we found that significant improvements could be achieved, compared to standard approaches that compute posteriors with respect to native models, by explicitly modeling both mispronunciations and correct pronunciations by nonnative speakers. In this work, we extend our approach with the use of model adaptation and discriminative modeling techniques, inspired on methods that have been effective in the area of speaker identification. Two systems were developed, one based on Bayesian adaptation of Gaussian Mixture Models (GMMs), and likelihood-ratio-based detection, and another one based on Support Vector Machines classification of supervectors derived from adapted GMMs. Both systems, and their combination, were evaluated in a phonetically transcribed Spanish database of 130,000 phones uttered in continuous speech sentences by 206 nonnative speakers, showing significant improvements from our previous best system.
Index Terms— Mispronunciation detection, computer-aided language learning.