• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
    • Our history
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Blog archive October 17, 2017

Managing Information Overload with Text Analytics

No scientist or researcher can keep current with the vast number of potentially relevant journal articles and technical documents published each year in a given field.  As with many problems involving overwhelming amounts of information, computers and advanced machine learning/artificial intelligence tools offer a potential solution. Text analytics can be applied to entire document sets to map the technical advances, research trends, and relationships between concepts within and across technical domains. Researchers can use these tools to identify connections, test hypotheses, and analyze what would otherwise be impossibly large datasets.

In recent years, SRI has made significant advancements in applying text analytics to understand and analyze research trends. SRI’s Copernicus platform is capable of analyzing large digital repositories of textual information to characterize documents, extract metrics, and plot the evolution of ideas and research within a specific domain of study. The platform was developed through the support of the Intelligence Advanced Research Projects Activity (IARPA) Foresight and Understanding from Scientific Exposition (FUSE) program. SRI has advanced the Copernicus platform through projects such as the Department of Energy’s (DOE) Solar Energy Evolution and Diffusion Studies (SEEDS) program. And in 2017, the Chan-Zuckerberg Initiative acquired the start-up Meta and the license to Copernicus.

SRI’s work developing the HELIOS tool for the SEEDS project serves as a representative example of how text analytics can help manage information overload. Based on the Copernicus platform, HELIOS was used to examine how research topics with solar cell research evolved over time, and to automatically extract performance metrics for solar cell conversion efficiency.

Using Text Analytics to Identify Topic Evolution

Topic modeling is a statistical tool used to discover abstract “topics” that occur in a collection of documents. By analyzing topic models over time, we can create a map of how research in a domain evolves, exploring how scientific terminology and concepts change over time as new topics emerge and old topics wane.

Temporal analysis of topic models can reveal when one topic replaces another – a phenomenon that often occurs when one technology replaces another. For example, the technology used in dye-sensitized solar cells (DSSC) shifted from aqueous approaches towards solid-state approaches in the mid-2000’s. SRI’s topic modeling highlighted the same shift.

Expanding this approach to apply to an entire document set, we can explore how different topics coalesce into single topics over time, how single topics diverge, and how topics replace one another as we explore a domain.

Extracting Performance Metrics Using Text Analytics

Solar cell technologies are tracked by how efficiently they can convert light into energy. This is arguably the most important performance metric for solar cells. National Renewable Energy Laboratory (NREL) regularly publishes a graph illustrating the best research-cell efficiencies across a number of solar technologies. Currently these data are gathered manually as individual researchers submit new records to NREL. We used machine learning to train the HELIOS tool to identify and extract efficiency claims from our document set on solar cell research. Our approach replicated the efficiency trend for Cadmium-Telluride solar cells exactly, and suggests that text analytics can be used to automatically conduct this performance extraction. Manually creating this sort of table in another technology area where there is no centralized entity would be a time-consuming and onerous task, especially for non-experts.

Using AI to Enhance Human Research

Researchers tend to specialize narrowly to best keep up with the advances in their domains of study. Text analytics enables researchers to examine a vast amount of information to incorporate a larger, and more diverse, scope of research in their own work. Text analytic tools serve as a means to confirm and stress-test the conclusions of human analysts, helping them identify topics, concepts, and trends that might not otherwise have been considered.

As these automated techniques improve, they are likely to be more widely used in research, especially in areas such as policy and research planning, where non-experts are often called upon to review a wide range of scientific literature. Because these tools hold promise for identifying connections and analyzing research trends, they will be of most benefit when used by creative and curious human analysts.

Share this

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International