Using grammatical inference to improve precision in information extraction

Citation

Freitag D. Using grammatical inference to improve precision in information extraction, in Proceedings of the ICML-97 Workshop on Automata Induction, Grammatical Inference, and Language Acquisition, 1997.

Abstract

The field of information extraction (IE) is concerned with applying natural language processing (NLP) and information retrieval (IR) techniques to the automatic extraction of essential details from text documents. We are exploring the use of machine learning methods for IE. While the most promising methods we have developed perform well for problems defined over a collection of electronic seminar announcements, they are imprecise in their identification of the boundaries of relevant text fragments (fields). Here, we entertain the idea of using grammatical inference (GI) methods to learn the appropriate form of a field. We describe one method for translating raw text into an abstract alphabet suitable for GI, and show that, by combining one IE learning method with the resulting inferred grammars, large improvements in precision can be realized for some fields.


Read more from SRI

  • surgeons around a surgical robot

    The SRI research behind today’s surgical robotics

    Intuitive’s da Vinci 5 system represents a major leap in robotic-assisted medicine. It all started at SRI, which continues to advance teleoperation technologies.

  • a collage of digital graphs

    A banner year for quantum

    SRI-managed QED-C’s annual report on quantum trends captures an industry accelerating rapidly from technical promise toward major global impact.

  • ICE Cube containing SRI’s aerogel experiment, photographed prior to launch. Source: Aerospace Applications North America

    An SRI carbon capture experiment launches into space

    By synthesizing carbon-absorbing aerogels in microgravity, SRI research will give us a rare glimpse into how these materials could be radically improved.