H. Franco, L. Neumeyer, Yoon Kim and O. Ronen, “Automatic pronunciation scoring for language instruction,” 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997, pp. 1471-1474 vol.2, doi: 10.1109/ICASSP.1997.596227.
This work is part of an effort aimed at developing computer-based systems for language instruction; we address the task of grading the pronunciation quality of the speech of a student of a foreign language. The automatic grading system uses SRI’s Decipher™ continuous speech recognition system to generate phonetic segmentations. Based on these segmentations and probabilistic models we produce pronunciation scores for individual or groups of sentences. Scores obtained from expert human listeners are used as the reference to evaluate the different machine scores and to provide targets when training some of the algorithms. In previous work  we had found that durationbased scores outperformed HMM log-likelihood-based scores. In this paper we show that we can significantly improve HMMbased scores by using average phone segment posterior probabilities. Correlation between machine and human scores went up from r=0.50 with likelihood-based scores to r=0.88 with posterior-based scores. The new measures also outperformed duration-based scores in their ability to produce reliable scores from only a few sentences.