Cohen, M., Franco, H., Morgan, N., Rumelhart, D., Abrash, V., & Konig, Y. Integrating Neural Networks Into Computer Speech Recognition Systems.
Most current state-of-the-art continuous-speech recognition systems are based on hidden Markov modeling techniques. The work described here involved integrating neural networks into a hidden Markov model-based state-of-the-art continuous-speech recognition system, resulting in improvements in recognition accuracy and reductions in model complexity. Hidden Markov models (HMMs) may be thought of as doubly stochastic finite state machines, consisting of a set of states, transition probabilities between states, and probability distributions over output symbols associated with each state. When used to model speech, these output symbols represent acoustic observations, modeling subphonetic acoustic events (e.g., closures, bursts, transitions). Current HMM-based speech recognition systems typically model phonetic units, or “phones” (e.g., the sound “m” in the word “map”), with a sequence of such states. Sequences of phone models can be concatenated to form word models. Word models can be connected according to grammatical constraints forming large networks that model any allowable sentence within an application. This approach allows a hierarchy of levels of linguistic description to be encoded within a uniform mathematical framework.