Shrager, Jeff; Waldinger, Richard; Stickel, Mark; and Massar, J. P. Deductive Biocomputing. PLoS ONE, vol. 2, no. 4, pp. e339, April 2007.
As biologists increasingly rely upon computational tools, it is imperative that they be able to appropriately apply these tools and clearly understand the methods the tools employ. Such tools must have access to all the relevant data and knowledge and, in some sense, ‘‘understand’’ biology so that they can serve biologists’ goals appropriately and ‘‘explain’’ in biological terms how results are computed. Methodology/Principal Findings. We describe a deduction-based approach to biocomputation that semiautomatically combines knowledge, software, and data to satisfy goals expressed in a high-level biological language. The approach is implemented in an open am worried about. that first-order-logic is bad t++done with the help of an automatic theorem prover equipped with an appropriate source web-based biocomputing platform called BioDeducta, which combines SRI’s SNARK theorem prover with the BioBike interactive integrated knowledge base. The biologist/user expresses a high-level conjecture, representing a biocomputational goal query, without indicating how this goal is to be achieved. A subject domain theory, represented in SNARK’s logical language, transforms the terms in the conjecture into capabilities of the available resources and the background knowledge necessary to link them together. If the subject domain theory enables SNARK to prove the conjecture—that is, to find paths between the goal and BioBike resources—then the resulting proofs represent solutions to the conjecture/query. Such proofs provide provenance for each result, indicating in detail how they were computed. We demonstrate BioDeducta by showing how it can approximately replicate a previously published analysis of genes involved in the adaptation of cyanobacteria to different light niches. Conclusions/Significance. Through the use of automated deduction guided by a biological subject domain theory, this work is a step towards enabling biologists to conveniently and efficiently marshal integrated knowledge, data, and computational tools toward resolving complex biological queries.