We simulate the process of corpus review and word list creation, showing that several simple interventions greatly improve recall as a function of simulated labor.
Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses
We consider the use of distant supervision for biological information extraction, and introduce two understudied corpora of this form, the Biological Expression Language (BEL) Large Corpus and the Pathway Logic (PL) Datum Corpus.
We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists.
This paper describes a hybrid approach to assessing process at scale in the context of the use of computational thinking practices during programming.
We present initial work that uses significant patterns to generate extraction rules, and conclude with a discussion of future directions of our work.
We describe a 460-million word corpus of online discussions.
We show that the performance measures Pk and Window Diff, commonly used for discourse, topic, and story segmentation evaluation, are biased in favor of segmentations with fewer or adjacent segment boundaries.
This paper presents the CALO-MA architecture and its speech recognition and understanding components.
We present a method for annotating verbal reference to people in conversational speech, with a focus on reference to conversation participants.