DoVETAIL: Domain Vocabulary Extraction and Transduction, and Auto-Induction of Layout | SRI International

Toggle Menu

DoVETAIL: Domain Vocabulary Extraction and Transduction & Auto-Induction of Layout

SRI and partners are developing new tools to extract better meaning and actionable analysis from data sources.

Dovetail logoFor intelligence analysts to quickly produce actionable information from unanticipated, multiple, and varied data sets, improvements are needed in two areas: alignment of data models, and advanced analytic algorithms. SRI and partners are pursuing those improvements for the Intelligence Advanced Research Projects Activity's Knowledge Discovery and Dissemination Program.  

As an example of the challenges to overcome, insights into covert operators might come from a variety of sources, such as watchlists, text, tables, and open-source data. Automated tools can analyze text to extract pertinent facts and some interrelationships. However, results depend upon the material having a style and vocabulary similar to that for which the tool was developed. Also, separating language to be analyzed from tables, titles, and headings requires custom programming or cutting and pasting by hand.

The Domain Vocabulary Extraction and Transduction and Auto-Induction of Layout (DoVETAIL) project is intended to improve how meaningful information is extracted and aligned with related information. SRI’s strengths in this project include a suite of proven text-analytic capabilities, and a rigorous framework that formalizes semantics in multiple domains. SRI also contributes experience in managing multi-institution research efforts.

The team is working to innovate and develop:

  • Powerful new technology for extracting names, events, and relationships from text in a way that is rapidly adaptable to new domains
  • Automation of many aspects of data preprocessing by extending technologies
  • A refined and coordinated view of a consensus ontology, overcoming topological discrepancies through new search operators and optimization procedures
  • Contextual field-matching of records, based on an analytical assessment of similar concepts, locations, dates, and times
  • Methods that detect and compensate for statistical disparities between different sources
  • Tools for retrieval, sorting, and clustering of documents, and events into an analysis that exploits multiple types of relationships and offers new semantic focusing