Context-Dependent Connectionist Probability Estimation in a Hybrid HMM-Neural Net Speech Recognition System



Franco, H., Cohen, M., Morgan, N., Rumelhart, D., & Abrash, V. (1994). Context-dependent connectionist probability estimation in a hybrid hidden Markov model-neural net speech recognition system. Computer Speech & Language, 8(3), 211-222.


In this paper we present a training method and a network architecture for the estimation of context-dependent observation probabilities in the framework of a hybrid Hidden Markov Model (HMM) / Multi Layer Perceptron (MLP) speaker independent continuous speech recognition system. The context-dependent modeling approach we present here computes the HMM context-dependent observation probabilities using a Bayesian factorization in terms of scaled posterior phone probabilities which are computed with a set of MLPs, one for every relevant context. The proposed network architecture shares the input-to-hidden layer among the set of context-dependent MLPs in order to reduce the number of independent parameters. Multiple states for phone models with different context dependence for each state are used to model the different context effects at the beginning and end of phonetic segments. A new training procedure that ‘‘smooths’’ networks with different degrees of context-dependence is proposed in order to obtain a robust estimate of the context-dependent probabilities. We have used this new architecture to model generalized biphone phonetic contexts. Tests with the speaker-independent DARPA Resource Management database have shown average reductions in word error rates of 20% in the word-pair grammar case, and 11% in the no-grammar case, compared to our earlier context-independent HMM/MLP hybrid.

Read more from SRI