Sequencing an organism’s genome is an important step in understanding how an organism functions. It elucidates the set of genes present in that organism, but genome annotation—the identification of genes within the DNA sequence and the assignment of functions to those genes—is an inexact science. Some genes receive incorrect functional assignments, functions for many genes cannot be inferred, and some genes are missed altogether. Typically, the process of genome annotation considers only small sets of genes at one time without asking whether the full set of genes obtained can produce a “working organism.”
A System-Level Approach to Validating a Genome Annotation
One major bioinformatics advance over the last decade has been the ability to rapidly create quantitative metabolic models from sequenced genomes. The metabolism of an organism is its cellular biochemical factory. It takes in chemicals from the cell’s environment and turns them into the chemicals needed for life. A metabolic model is a mathematical description of that biochemical factory that assigns rates to different production lines (pathways) within the factory.
There are two main types of metabolic models: kinetic and steady state. Kinetic models predict how the metabolic state of a cell changes over time, whereas steady-state models describe a cell with its metabolic machinery at equilibrium, steadily churning out energy and the end products of biosynthesis from the nutrients that the cell intakes. Steady-state models are much easier to create than kinetic models. They do not require the large number of difficult-to-measure quantitative parameters that kinetic models require, making it practical to create steady-state models at the genome scale, based on genome annotations.
Steady-state metabolic models predict the rate of turnover of molecules through metabolic pathways, a process known as flux. At steady state, the sum of the fluxes that produce each cellular metabolite equals the sum of the fluxes that consume each metabolite. Fluxes are balanced; hence the name for this major steady-state modeling technique is flux-balance analysis.
Steady-state metabolic flux models have five components:
- Nutrients available as inputs to the metabolism, which include one or more sources of carbon, nitrogen, phosphorus, and sulfur.
- Metabolites created as end products of metabolism, called the biomass metabolites, which include amino acids, nucleotides, lipids and polysaccharides.
- Waste products secreted by the cellular metabolism, such as carbon dioxide, methane, hydrogen gas, excess water and protons, and fermentation products such as acetate and ethanol.
- Reactions constituting the metabolic network, which can be in the thousands.
- Optional constraints on the fluxes within the networks, such as constraints on the uptake rates of different nutrients.
Metabolic models can be developed to varying levels of accuracy and validation. The simplest level of validation is verifying that the model can produce all biomass metabolites from the input nutrients, or put another way, demonstrating that the model can “grow.” The factory analogy to growth of the model is that all production lines within the factory are active and able to produce final products from their inputs.
In my experience, metabolic models never exhibit growth the first time they are run, just as computer programs rarely work the first time they are run. The most frequent reason that models fail to grow is because they lack one or more critical metabolic reactions. That incompleteness is typically due to missing information in the genome annotation. Genes whose function was not predicted at all – or were predicted incorrectly during sequence analysis—lead to missing reactions in the metabolic network, referred to as network gaps. A network gap in a factory would correspond to a missing station within a production line that prevents the production of all products of that production line.
Not all network gaps prevent model growth, because the cell can circumvent some gaps using other routes through the metabolic network. But a gap that prevents the production by the network of any one essential metabolite (typical models contain about 50 essential metabolites) will prevent model growth. In practice, this is just what happens. Thus, model growth becomes a test for the validity of the genome annotation and of the metabolic reconstruction (the metabolic reaction set) derived from the genome annotation by software such as SRI’s Pathway Tools.
Identifying the missing reactions in a metabolic network is quite a challenge when approached manually. To address this problem, the MetaFlux modeling tool provides a gap filler module that automatically suggests what reactions are missing from the metabolic network. The gap filler computes a minimal set of reactions from our MetaCyc database that, if added to the organism’s metabolic model, will enable growth of the model. Even highly curated genome annotations contain network gaps that can be identified using metabolic modeling.
A metabolic model that shows growth under a given nutrient set is still at a relatively early stage of development and probably requires further validation before it will produce accurate quantitative predictions, such as predicting the rate of cellular growth. Regardless, this approach is likely to improve the quality of genome annotations.
Construction of metabolic models could become a routine part of the genome annotation process, leading to higher quality annotations that would set a new bar for publications on completely sequenced genomes.
MetaFlux and the full Pathway Tools software are freely available from SRI for academic use.
Learn more about how to use the Pathway Tools software to create quantitative metabolic models for an organism of interest by attending SRI’s Pathway Tools Flux Balance Analysis Tutorial – the next tutorial is scheduled for March 18-19, 2015 in Menlo Park, California.
Related: Pathway Tools User’s Guide