The MetaFlux software supports creating, executing, and solving quantitative metabolic flux models using flux balance analysis (FBA).
Background: Microbiomes are complex aggregates of organisms, each of which has its own extensive metabolic network. A variety of metabolites are exchanged between the microbes. The challenge we address is understanding the overall metabolic capabilities of a microbiome: through what series of metabolic transformations can a microbiome convert a starting compound to an ending compound?
Results: We developed an efficient software tool to search for metabolic routes that include metabolic reactions from multiple organisms. The metabolic network for each organism is obtained from BioCyc, where the network was inferred from the annotated genome. The tool searches for optimal metabolic routes that minimize the number of reactions in each route, maximize the number of atoms conserved between the starting and ending compounds, and minimize the number of organism switches. The tool pre-computes the reaction sets found in each organism from BioCyc to facilitate fast computation of the reactions defined in a researcher-specified organism set. The generated routes are depicted graphically, and for each reaction in a route, the tool lists the organisms that can catalyze that reaction. We present solutions for three route-finding problems in the human gut microbiome: (1) production of indoxyl sulfate, (2) production of trimethylamine N-oxide (TMAO), and (3) synthesis and degradation of autoinducers. The optimal routes computed by our multi-organism route-search (MORS) tool for indoxyl sulfate and TMAO were the same as routes reported in the literature.
Conclusions: Our tool quickly found plausible routes for the discussed multi-organism route-finding problems. The routes shed light on how diverse organisms cooperate to perform multi-step metabolic transformations. Our tool enables scientists to consider multiple alternative routes and identifies the organisms responsible for each reaction.
Interpreting changes in metabolite abundance in response to experimental treatments or disease states remains a major challenge in metabolomics. Pathway Covering is a new algorithm that takes a list of metabolites (compounds) and determines a minimum-cost set of metabolic pathways in an organism that includes (covers) all the metabolites in the list. We used five functions for assigning costs to pathways, including assigning a constant for all pathways, which yields a solution with the smallest pathway count; two methods that penalize large pathways; one that prefers pathways based on the pathway’s assigned function, and one that loosely corresponds to metabolic flux. The pathway covering set computed by the algorithm can be displayed as a multi-pathway diagram (“pathway collage”) that highlights the covered metabolites. We investigated the pathway covering algorithm by using several datasets from the Metabolomics Workbench. The algorithm is best applied to a list of metabolites with significant statistics and fold-changes with a specified direction of change for each metabolite. The pathway covering algorithm is now available within the Pathway Tools software and BioCyc website.
Microbial genome web portals have a broad range of capabilities that address a number of information-finding and analysis needs for scientists. This article compares the capabilities of the major microbial genome web portals to aid researchers in determining which portal(s) are best suited to their needs. We assessed both the bioinformatics tools and the data content of BioCyc, KEGG, Ensembl Bacteria, KBase, IMG, and PATRIC. For each portal, our assessment compared and tallied the available capabilities. The strengths of BioCyc include its genomic and metabolic tools, multi-search capabilities, table-based analysis tools, regulatory network tools and data, omics data analysis tools, breadth of data content, and large amount of curated data. The strengths of KEGG include its genomic and metabolic tools. The strengths of Ensembl Bacteria include its genomic tools and large number of genomes. The strengths of KBase include its genomic tools and metabolic models. The strengths of IMG include its genomic tools, multi-search capabilities, large number of genomes, table-based analysis tools, and breadth of data content. The strengths of PATRIC include its large number of genomes, table-based analysis tools, metabolic models, and breadth of data content.
MetaCyc ( https://MetaCyc.org) is a comprehensive reference database of metabolic pathways and enzymes from all domains of life. It contains more than 2570 pathways derived from >54 000 publications, making it the largest curated collection of metabolic pathways. The data in MetaCyc is strictly evidence-based and richly curated, resulting in an encyclopedic reference tool for metabolism. MetaCyc is also used as a knowledge base for generating thousands of organism-specific Pathway/Genome Databases (PGDBs), which are available in the BioCyc ( https://BioCyc.org) and other PGDB collections. This article provides an update on the developments in MetaCyc during the past two years, including the expansion of data and addition of new features.
Completion of genome-scale flux-balance models using computational reaction gap-filling is a widely used approach, but its accuracy is not well known.
We report on computational experiments of reaction gap filling in which we generated degraded versions of the EcoCyc-20.0-GEM model by randomly removing flux-carrying reactions from a growing model. We gap-filled the degraded models and compared the resulting gap-filled models with the original model. Gap-filling was performed by the Pathway Tools MetaFlux software using its General Development Mode (GenDev) and its Fast Development Mode (FastDev). We explored 12 GenDev variants including two linear solvers (SCIP and CPLEX) for solving the Mixed Integer Linear Programming (MILP) problems for gap filling; three different sets of linear constraints were applied; and two MILP methods were implemented. We compared these 13 variants according to accuracy, speed, and amount of information returned to the user.
We observed large variation among the performance of the 13 gap-filling variants. Although no variant was best in all dimensions, we found one variant that was fast, accurate, and returned more information to the user. Some gap-filling variants were inaccurate, producing solutions that were non-minimum or invalid (did not enable model growth). The best GenDev variant showed a best average precision of 87% and a best average recall of 61%. FastDev showed an average precision of 71% and an average recall of 59%. Thus, using the most accurate variant, approximately 13% of the gap-filled reactions were incorrect (were not the reactions removed from the model), and 39% of gap-filled reactions were not found, suggesting that curation is still an important aspect of metabolic-model development.
Reaction gap filling is a computational technique for proposing the addition of reactions to genome-scale metabolic models to permit those models to run correctly. Gap filling completes what are otherwise incomplete models that lack fully connected metabolic networks. The models are incomplete because they are derived from annotated genomes in which not all enzymes have been identified. Here we compare the results of applying an automated likelihood-based gap filler within the Pathway Tools software with the results of manually gap filling the same metabolic model. Both gap-filling exercises were applied to the same genome-derived qualitative metabolic reconstruction for Bifidobacterium longum subsp. longum JCM 1217, and to the same modeling conditions — anaerobic growth under four nutrients producing 53 biomass metabolites.
The solution computed by the gap-filling program GenDev contained 12 reactions, but closer examination showed that solution was not minimal; two of the twelve reactions can be removed to yield a set of ten reactions that enable model growth. The manually curated solution contained 13 reactions, eight of which were shared with the 12-reaction computed solution. Thus, GenDev achieved recall of 61.5% and precision of 66.6%. These results suggest that although computational gap fillers are populating metabolic models with significant numbers of correct reactions, automatically gap-filled metabolic models also contain significant numbers of incorrect reactions.
Our conclusion is that manual curation of gap-filler results is needed to obtain high-accuracy models. Many of the differences between the manual and automatic solutions resulted from using expert biological knowledge to direct the choice of reactions within the curated solution, such as reactions specific to the anaerobic lifestyle of B. longum.
EcoCyc (EcoCyc.org) is a freely accessible, comprehensive database that collects and summarizes experimental data for Escherichia coli K-12, the best-studied bacterial model organism. New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. New SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists. EcoCyc now supports running and modifying E. coli metabolic models directly on the EcoCyc website.
BioCyc.org is a microbial genome Web portal that combines thousands of genomes with additional information inferred by computer programs, imported from other databases and curated from the biomedical literature by biologist curators.