We simulate the process of corpus review and word list creation, showing that several simple interventions greatly improve recall as a function of simulated labor.
We present VALET, a framework for rule-based information extraction written in Python. We show how a handful of rules suffices to implement sophisticated matching, and describe a user interface that facilitates exploration for development and maintenance of rule sets.
Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses
We consider the use of distant supervision for biological information extraction, and introduce two understudied corpora of this form, the Biological Expression Language (BEL) Large Corpus and the Pathway Logic (PL) Datum Corpus.
We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists.
We present initial work that uses significant patterns to generate extraction rules, and conclude with a discussion of future directions of our work.
We describe a 460-million word corpus of online discussions.
Airborne Observation of Aerosol Optical Depth During Arctas: Vertical Profiles, Inter-Comparison and Fine-Mode Fraction
We describe aerosol optical depth (AOD) measured during the Arctic Research of the Composition of the Troposphere from Aircraft and Satellites (ARCTAS) experiment, focusing on vertical profiles, inter-comparison with correlative observations and fine-mode fraction.
We report on our efforts as part of the shared task on the NEWS 2009 Machine Transliteration Shared Task. We applied an orthographic perceptron character edit model that we have used previously for name transliteration…
We discuss a named entity recognition system for Arabic, and show how we incorporated the information provided by MADA, a full morphological tagger which uses a morphological analyzer.