Technical Director, Artificial Intelligence Center
Dayne Freitag, Ph.D., is Technical Director of the Advanced Analytics group in SRI’s Artificial Intelligence Center. His research seeks to apply artificial intelligence to information assimilation, management and exploitation. Specific areas of interest include natural language processing and computational linguistics; machine learning; data mining; information extraction; information retrieval; information diffusion; and information integration.
Freitag has served as principal investigator for a number of research projects including several large, multi-institutional efforts. His research goals have focused on the automation of data science; the automatic extension of mechanistic models through machine reading; knowledge federation over diverse information sources through data analytics and natural language processing; explaining the spread of ideas through online communities; and novel approaches to institutional knowledge management using controlled English.
Before joining SRI in 2009, Freitag led and participated in fundamental research at Fair Isaac Corporation with a special focus on the rapid deployment of textual information extraction, and the unsupervised induction of linguistic structure and function through corpus analysis. He also was a research scientist and vice president for technology at Burning Glass Technologies, LLC, which develops solutions for human resource management using machine learning and natural language processing. Between 1998 and 2000, Freitag conducted information extraction and information retrieval research at Just Research.
Freitag holds a B.A. in English literature from Reed College, and a Ph.D. in computer science from Carnegie Mellon University.
We simulate the process of corpus review and word list creation, showing that several simple interventions greatly improve recall as a function of simulated labor.
We present VALET, a framework for rule-based information extraction written in Python. We show how a handful of rules suffices to implement sophisticated matching, and describe a user interface that facilitates exploration for development and maintenance of rule sets.
Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses
We consider the use of distant supervision for biological information extraction, and introduce two understudied corpora of this form, the Biological Expression Language (BEL) Large Corpus and the Pathway Logic (PL) Datum Corpus.
We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists.
We present initial work that uses significant patterns to generate extraction rules, and conclude with a discussion of future directions of our work.
We describe a 460-million word corpus of online discussions.