Shrager, Jeff; Waldinger, Richard; Stickel, Mark; and Massar, J. P. Deductive Biocomputing. PLoS ONE, vol. 2, no. 4, pp. e339, April 2007.
Biologists must increasingly conduct computational analyses across integrated biological knowledge and data , but biologists who cannot program are relegated to searching in knowledge bases and carrying out canned computations that have been programmed by others. To conduct novel computations—ones for which no canned solution exists—biologists often hire programmers, but this is either expensive or hit-and-miss (or both), and usually results in overly specific one-off solutions. Moreover, putting non-biologist programmers between biologists and biocomputation removes the biologist from the details of the computation , which makes biologists nervous about the correctness of the results—as well it should. Unlike physics, there is no clear categorization within biology of, for example, theoretical vs. experimental vs. computational biologists whose practitioners could naturally collaborate. As a result, it is often nearly impossible for the authors of papers (usually biologists) to accurately and completely describe the computational methods used to solve a given biological problem. Although this will certainly change as computational techniques become standardized and biologists learn to program—indeed a whole new species called ‘‘computational biologist’’ is rapidly evolving—it would be useful if biologists had ‘‘intelligent’’ tools that, in a sense, ‘‘understand’’ biology, could help the biologists conduct analyses, and could explain how the analyses were accomplished, all expressed in terms that are natural to the biologists.
The present paper explores just such an ‘‘intelligent’’ paradigm that we call deductive biocomputing. In our paradigm biologists/users express queries as goals in high-level near-natural terms. Others— usually biologists who are more computationally savvy working with programmers—provide instructions to the platform about how to satisfy those goals. Automatic methods then put these together to satisfy the goals and to explicitly record how the answers were discovered. This is analogous to the widely studied problem of automatic program generation [3,4] in which a specification for a program is expressed in a high-level language and then translated into an actual program that implements the specification. (Although fully general automatic program synthesis is an unsolved problem, the present query resolution problem is more tractable because we are looking for an answer to a specific query, not a procedure that will answer an entire class of queries. In particular, because the inputs to the query are generally known in advance, we can evaluate tests and unwind loops at proof-time, this avoiding having to automatically generate conditional branches and iterative or recursive loops.)