Neural methods of molecule property prediction require efficient encoding of structure and property relationship to be accurate. Recent work using graph algorithms shows limited generalization in the latent molecule encoding space. We build a Transformer-based molecule encoder and property predictor network with novel input featurization that performs significantly better than existing methods. We adapt our model to semi-supervised learning to further perform well on the limited experimental data usually available in practice.
Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge into Deep Neural Networks
We introduce Deep Adaptive Semantic Logic (DASL), a novel framework for automating the generation of deep neural networks that incorporates user-provided formal knowledge to improve learning from data. We provide formal semantics that demonstrate that our knowledge representation captures all of first order logic and that finite sampling from infinite domains converges to correct truth values. DASL’s representation improves on prior neural-symbolic work by avoiding vanishing gradients, allowing deeper logical structure, and enabling richer interactions between the knowledge and learning components. We illustrate DASL through a toy problem in which we add structure to an image classification problem and demonstrate that knowledge of that structure reduces data requirements by a factor of 1000 . We then evaluate DASL on a visual relationship detection task and demonstrate that the addition of commonsense knowledge improves performance by 10.7 % in a data scarce setting.
Application of Text Analytics to Extract and Analyze Material–Application Pairs from a Large Scientific Corpus
When assessing the importance of materials (or other components) to a given set of applications, machine analysis of a very large corpus of scientific abstracts can provide an analyst a base of insights to develop further. The use of text analytics reduces the time required to conduct an evaluation, while allowing analysts to experiment with a multitude of different hypotheses. Because the scope and quantity of metadata analyzed can, and should, be large, any divergence from what a human analyst determines and what the text analysis shows provides a prompt for the human analyst to reassess any preliminary findings. In this work, we have successfully extracted material–application pairs and ranked them on their importance. This method provides a novel way to map scientific advances in a particular material to the application for which it is used. Approximately 438,000 titles and abstracts of scientific papers published from 1992 to 2011 were used to examine 16 materials. This analysis used coclustering text analysis to associate individual materials with specific clean energy applications, evaluate the importance of materials to specific applications, and assess their importance to clean energy overall. Our analysis reproduced the judgments of experts in assigning material importance to applications. The validated methods were then used to map the replacement of one material with another material in a specific application (batteries).
Spatial and Temporal Patterns in Preterm Birth in the United States
BACKGROUND:
Despite years of research, the etiologies of preterm birth remain unclear. In order to help generate new research hypotheses, this study explored spatial and temporal patterns of preterm birth in a large, total-population dataset.
METHODS:
Data on 145 million US births in 3,000 counties from the Natality Files of the National Center for Health Statistics for 1971-2011 were examined. State trends in early (<34 wk) and late (34-36 wk) preterm birth rates were compared. K-means cluster analyses were conducted to identify gestational age distribution patterns for all US counties over time.
RESULTS:
A weak association was observed between state trends in <34 wk birth rates and the initial absolute <34 wk birth rate. Significant associations were observed between trends in <34 wk and 34-36 wk birth rates and between white and African American <34 wk births. Periodicity was observed in county-level trends in <34 wk birth rates. Cluster analyses identified periods of significant heterogeneity and homogeneity in gestational age distributional trends for US counties.
CONCLUSION:
The observed geographic and temporal patterns suggest periodicity and complex, shared influences among preterm birth rates in the United States. These patterns could provide insight into promising hypotheses for further research.