We describe a large-scale experiment in which non-artificial intelligence subject matter experts (SMEs)—with neither artificial intelligence background nor extensive training in the task—author knowledge bases (KBs) following a challenge problem specification with a strong question-answering component.
Artificial intelligence publications
We present VALET, a framework for rule-based information extraction written in Python.
We present a new approach to dialogue specification for Virtual Personal Assistants (VPAs) based on so-called dialogue workflow graphs. Our approach relies on Semantic Web technology (OWL), implemented in Common Lisp with the help of the Racer reasoner.
The MetaFlux software supports creating, executing, and solving quantitative metabolic flux models using flux balance analysis (FBA).
We describe work in progress towards deriving a unification algorithm automatically from a declarative specification using deductive methods. The specification is phrased as a logical theorem and the program is extracted from the proof. The theorem is proved using Stickel’s system SNARK, operating over a theory of expressions and substitutions. The theory has been formulated to allow a simpler specification of the algorithm, and the theorem prover has discovered novelties in the implementation. It is hoped that the same techniques may enable the discovery of previously unknown unification algorithms for specific theories.
Pathway Size Matters: The Influence of Pathway Granularity on Over-Representation (Enrichment) Statistics
Background: Enrichment or over-representation analysis is a common method used in bioinformatics studies of transcriptomics, metabolomics, and microbiome datasets. The key idea behind enrichment analysis is: given a set of significantly expressed genes (or metabolites), use that set to infer a smaller set of perturbed biological pathways or processes, in which those genes (or metabolites) play a role. Enrichment computations rely on collections of defined biological pathways and/or processes, which are usually drawn from pathway databases. Although practitioners of enrichment analysis take great care to employ statistical corrections (e.g., for multiple testing), they appear unaware that enrichment results are quite sensitive to the pathway definitions that the calculation uses.
Results: We show that alternative pathway definitions can alter enrichment p -values by up to nine orders of magnitude, whereas statistical corrections typically alter enrichment p -values by only two orders of magnitude. We present multiple examples where the smaller pathway definitions used in the EcoCyc database produces stronger enrichment p -values than the much larger pathway definitions used in the KEGG database; we demonstrate that to attain a given enrichment p -value, KEGG-based enrichment analyses require 1.3–2.0 times as many significantly expressed genes as does EcoCyc-based enrichment analyses. The large pathways in KEGG are problematic for another reason: they blur together multiple (as many as 21) biological processes. When such a KEGG pathway receives a high enrichment p -value, which of its component processes is perturbed is unclear, and thus the biological conclusions drawn from enrichment of large pathways are also in question.
Conclusions: The choice of pathway database used in enrichment analyses can have a much stronger effect on the enrichment results than the statistical corrections used in these analyses.
Leveraging Curation Among Escherichia coli Pathway/Genome Databases Using Ortholog-Based Annotation Propagation
Abstract Updating genome databases to reflect newly published molecular findings for an organism was hard enough when only a single strain of a given organism had been sequenced. With multiple sequenced strains now available for many organisms, the challenge has grown significantly because of the still-limited resources available for the manual curation that corrects errors […]
Abstract Metabolomics, synthetic biology, and microbiome research demand information about organism-scale metabolic networks. The convergence of genome sequencing and computational inference of metabolic networks has enabled great progress toward satisfying that demand by generating metabolic reconstructions from the genomes of thousands of sequenced organisms. Visualization of whole metabolic networks is critical for aiding researchers in […]
Neural methods of molecule property prediction require efficient encoding of structure and property relationship to be accurate. Recent work using graph algorithms shows limited generalization in the latent molecule encoding space. We build a Transformer-based molecule encoder and property predictor network with novel input featurization that performs significantly better than existing methods. We adapt our model to semi-supervised learning to further perform well on the limited experimental data usually available in practice.