Author: John Niekrasz

October 8, 2022

Accelerating Human Authorship of Information Extraction Rules

We simulate the process of corpus review and word list creation, showing that several simple interventions greatly improve recall as a function of simulated labor.
August 1, 2016

Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses

We consider the use of distant supervision for biological information extraction, and introduce two understudied corpora of this form, the Biological Expression Language (BEL) Large Corpus and the Pathway Logic (PL) Datum Corpus.
January 1, 2016

An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text

We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists.
January 1, 2016

Assessing Problem-Solving Process At Scale

This paper describes a hybrid approach to assessing process at scale in the context of the use of computational thinking practices during programming.
October 1, 2013

Unsupervised Discovery and Extraction of Semi-Structured Regions in Text Via Self-Information

We present initial work that uses significant patterns to generate extraction rules, and conclude with a discussion of future directions of our work.
January 1, 2012

A corpus of online discussions for research into linguistic memes

We describe a 460-million word corpus of online discussions.
December 1, 2010

Unbiased discourse segmentation evaluation

We show that the performance measures Pk and Window Diff, commonly used for discourse, topic, and story segmentation evaluation, are biased in favor of segmentations with fewer or adjacent segment boundaries.
August 1, 2010

The CALO meeting assistant system

This paper presents the CALO-MA architecture and its speech recognition and understanding components.
January 1, 2010

Annotating Participant Reference in English Spoken Conversation

We present a method for annotating verbal reference to people in conversational speech, with a focus on reference to conversation participants.
January 1, 2009

Participant Subjectivity and Involvement As a Basis for Discourse Segmentation

We propose a framework for analyzing episodic conversational activities in terms of expressed relationships between the participants and utterance content.
January 1, 2008

Meeting Adjourned: Off-line Learning Interfaces for Automatic Meeting Understanding

We explore interfaces for presenting this information to users after a meeting is completed, using two post-meeting interfaces that display information from topics and action items respectively.
January 1, 2008

Meeting Structure Annotation

We describe a generic set of tools for representing, annotating, and analysing multi-party discourse, including: an ontology of multimodal discourse, a programming interface for that ontology, and NOMOS – a flexible and extensible toolkit for browsing and annotating discourse.