Multistrategy learning for information extraction

Citation

Freitag D. Multistrategy learning for information extraction, in Proceedings of ICML 98, 1998.

Abstract

Information extraction (IE) is the problem of filling out predefined structured summaries from text documents. We are interested in performing IE in nontraditional domains where much of the text is often ungrammatical such as electronic bulletin board posts and Web pages. We suggest that the best approach is one that takes into account many different kinds of information and argue for the suitability of a multistrategy approach We describe learners for IE drawn from three separate machine learning paradigms: rote memorization, termspace text classification and relational rule induction. By building regression models mapping from learner confidence to probability of correctness and combining probabilities appropriately it is possible to improve extraction accuracy over that achieved by any individual learner. We describe three different multistrategy approaches. Experiments on two IE domains a collection of electronic seminar announcements from a university computer science department and a set of newswire articles describing corporate acquisitions from the Reuters collection demonstrate the effectiveness of all three approaches.


Read more from SRI

  • surgeons around a surgical robot

    The SRI research behind today’s surgical robotics

    Intuitive’s da Vinci 5 system represents a major leap in robotic-assisted medicine. It all started at SRI, which continues to advance teleoperation technologies.

  • a collage of digital graphs

    A banner year for quantum

    SRI-managed QED-C’s annual report on quantum trends captures an industry accelerating rapidly from technical promise toward major global impact.

  • ICE Cube containing SRI’s aerogel experiment, photographed prior to launch. Source: Aerospace Applications North America

    An SRI carbon capture experiment launches into space

    By synthesizing carbon-absorbing aerogels in microgravity, SRI research will give us a rare glimpse into how these materials could be radically improved.