Application of Text Analytics to Extract and Analyze Material–Application Pairs from a Large Scientific Corpus

SRI Authors: John Byrnes, Lucien Randazzese, Daragh P. Hartnett


Kalathil, N., Byrnes, J. J., Randazzese, L., Hartnett, D. P., & Freyman, C. A. (2018). application of Text analytics to extract and analyze Material–application Pairs from a large scientific corpus. Frontiers in Research Metrics and Analytics, 2, 15.


When assessing the importance of materials (or other components) to a given set of applications, machine analysis of a very large corpus of scientific abstracts can provide an analyst a base of insights to develop further. The use of text analytics reduces the time required to conduct an evaluation, while allowing analysts to experiment with a multitude of different hypotheses. Because the scope and quantity of metadata analyzed can, and should, be large, any divergence from what a human analyst determines and what the text analysis shows provides a prompt for the human analyst to reassess any preliminary findings. In this work, we have successfully extracted material–application pairs and ranked them on their importance. This method provides a novel way to map scientific advances in a particular material to the application for which it is used. Approximately 438,000 titles and abstracts of scientific papers published from 1992 to 2011 were used to examine 16 materials. This analysis used coclustering text analysis to associate individual materials with specific clean energy applications, evaluate the importance of materials to specific applications, and assess their importance to clean energy overall. Our analysis reproduced the judgments of experts in assigning material importance to applications. The validated methods were then used to map the replacement of one material with another material in a specific application (batteries).

Read more from SRI