A Framework For Speech Understanding


Paxton, W. H. (1977). A framework for speech understanding. Stanford University.


This paper reports the author’s results in designing, implementing, and testing a framework for a speech-understanding system. The work was done as part of a multi-disciplinary effort based on state-of-the-art advances in computational linguistics, artificial intelligence, systems programming, and speech science. The overall project goal was to develop one or more computer systems that would recognize continuous speech uttered in the context or some well-specified task by making extensive use of grammatical, semantic, and and contextual constraints. We call a system emphasizing such linguistic constraints a `speech-understanding system’ to distinguish it from speech-recognition systems which rely on acoustic information alone.

Two major aspects of a framework for speech understanding are integration of the process of forming a unified system out of the collection of components–and control–the dynamic direction of the overall activity of the system during the processing of an input utterance. Our method of system integration gives a central role to the input-language definition, which is based on augmented phrase-structure rules. A rule consists of a phrase-structure declaration which specifies the possible for computing ’attributes’ and `factors.’ Attribute statements determine the properties of particular phrases constructed by the rule; factor statements make acceptability judgments on phrases. Together these statements contain specifications for most of the potential interactions among system components.

Our approach to system control centers on a system `Executive’ applying the rules of the language definition organizing hypotheses and results, and assigning priorities. Phrases with their attributes and factors are the basic entities manipulated by the Executive, which takes on the role of a parser in carrying out its integration and control functions. The Executive controls the overall activity of the system by setting priorities on the basis of acoustics and linguistic acceptability judgments. These data are combined to form scores and ratings. A phrase score reflects a quality judgment independent of the phrase’s context and gives useful local information concerning the sentential context. To get early and efficient access to the contextual information, we have developed a technique for calculating phrase ratings by a heuristic search of possible interpretations that would use the phrase. One of our experiments shows that this context-checking method results in significant improvements in system performance.

These experiments are important for evaluating a complex system framework such as ours. It is not enough simply to demonstrate that a system with certain features can be implemented; a working system shows that the features are not disastrous, but it does not show the good effects, if any, the features have on performance. Experimentation is a valuable technique for use in discovering and explaining actual effects  and interactions of the design features. In a series of experiments, we have studied system features by comparing the performance with a particular feature to the performance with a a simpler alternative in the place of that feature. The observed difference indicates the important of that feature, and interactions are revealed by comparing different combinations of features. The results of the experiments give a better understanding of system performance and suggest new lines of development. 

Read more from SRI