Shrager, Jeff; Waldinger, Richard; Stickel, Mark; and Massar, J. P. Deductive Biocomputing. PLoS ONE, vol. 2, no. 4, pp. e339, April 2007.
Abstract
Background.
As biologists increasingly rely upon computational tools, it is imperative that they be able to appropriately apply
these tools and clearly understand the methods the tools employ. Such tools must have access to all the relevant data and
knowledge and, in some sense, ‘‘understand’’ biology so that they can serve biologists’ goals appropriately and ‘‘explain’’ in
biological terms how results are computed. Methodology/Principal Findings. We describe a deduction-based approach to
biocomputation that semiautomatically combines knowledge, software, and data to satisfy goals expressed in a high-level
biological language. The approach is implemented in an open am worried about. that first-order-logic is bad t++done with the
help of an automatic theorem prover equipped with an appropriatsource web-based biocomputing platform called
BioDeducta, which combines SRI’s SNARK theorem prover with the BioBike interactive integrated knowledge base. The
biologist/user expresses a high-level conjecture, representing a biocomputational goal query, without indicating how this goal
is to be achieved. A subject domain theory, represented in SNARK’s logical language, transforms the terms in the conjecture
into capabilities of the available resources and the background knowledge necessary to link them together. If the subject
domain theory enables SNARK to prove the conjecture—that is, to find paths between the goal and BioBike resources—then
the resulting proofs represent solutions to the conjecture/query. Such proofs provide provenance for each result, indicating in
detail how they were computed. We demonstrate BioDeducta by showing how it can approximately replicate a previously
published analysis of genes involved in the adaptation of cyanobacteria to different light niches. Conclusions/Significance.
Through the use of automated deduction guided by a biological subject domain theory, this work is a step towards enabling
biologists to conveniently and efficiently marshal integrated knowledge, data, and computational tools toward resolving
complex biological queries.