Appelt, Douglas and Hobbs, Jerry and Bear, John and Israel, David and Kameyama, Megumi and Tyson, Mabry. SRI: Description of the JV-FASTUS System Used for MUC-5, in the Proceedings of the Fifth Message Understanding Conference (MUC-5), Tokyo, Japan, Aug 1993.
SRI International developed an information extraction system called FASTUS1 , a permuted acronym standing for “Finite State Automata-based Text Understanding System. The choice of acronym is somewhat misleading, however, because FASTUS is a system for information extraction, not text understanding. The former problem is much simpler and more tractable, characterized by a relatively straightforward specification of information to be extracted from the text, only a fraction of which is relevant to the extraction task, and with the author’s underlying goals and nuances of meaning of little interest.
In contrast, a text understanding task is to recover all of the information in a text, including that which is only implicit in what is actually written. All the richness of natural language becomes fair game, including metaphor, metonymy, discourse structure, and the recognition of the author’s underlying intentions, and the full interplay between language and world knowledge becomes central to the task. Text understanding is extremely difficult, and presents a number of research problems that have not yet been adequately solved. On the other hand, the relative simplicity of the information extraction task means that the full complexity of natural language need not be confronted head-on. In fact, much simpler mechanisms can be successfully employed to solve the more constrained problem, and in a computationally efficient and conceptually elegant way. It was this insight that led to the development of FASTUS for extracting information from articles about terrorism in Latin America for the MUC-4 evaluation.
In contrast to natural-language processing systems designed for text understanding applications, FASTUS does not do a comp~ete syntactic and semantic analysis of each sentence. Instead, sentences are processed by a sequence of nondeterministic finite-state transducers. The output of each level of transducers becomes the input to the next level. Each level of processing produces some new linguistic structure, and discards some information that is irrelevant to the information extraction task. The nondeterminism of the transducers makes it possible to produce local analyses of fragments of the input that can be combined into a complete analysis. There is no need to determine the complete structure of each sentence when such an effort has little payoff for the task at hand. The nondeterminism can also be exploited to produce competing analyses of portions of the text. These alternatives can be compared, and the best analysis can be selected for processing at subsequent levels, reducing the combinatoric complexity of the subsequent levels.