Curation Tells Researchers the Stories of Metabolic Pathways in the Biochemical Factories of Life
SRI’s MetaCyc (“metabolic encyclopedia”) database is a huge collection of metabolic pathways and enzyme descriptions from all domains of life. Researchers in a broad range of fields, from biotech to agriculture, use MetaCyc content to study organism pathways and metabolism. Its applications range from green chemicals to reducing climate change to improving human health to wine making.
Scientists consult MetaCyc because it concisely summarizes diverse research articles that can be hard to find and extremely time consuming to assimilate. Curation, the manual entry and updating of information, is absolutely indispensable to the creation of a database as comprehensive as MetaCyc. The level of curation that has gone into MetaCyc has elevated it from a simple catalog of reactions and chemicals to a rich, online encyclopedia that is both a reference source for scientists and a resource for analyzing newly sequenced genomes.
MetaCyc covers 2,250 pathways from all domains of life. It includes more than 12,000 biochemical reactions and more than 11,600 chemical compounds. MetaCyc integrates information from 44,000 scientific publications. Our collaborators at Boyce Thompson Institute and the Carnegie Institution, who provided information about plant and fungal pathways, have been key to the success of the project.
A “pathway” is basically a collection of metabolic reactions and the enzymes that catalyze them. The reactions form a sequence—a kind of assembly line—in which the chemical produced by one reaction is consumed by the next reaction down the line. To curate a new pathway in MetaCyc, we
- first conduct a literature search to find all publications relevant to the pathway,
- carefully digest the findings within those publications,
- and then assemble their information into a single diagram that curators enter into the database using specially designed software.
This level of curation is what sets MetaCyc apart from other databases. With MetaCyc, we make a great effort to ensure that we tell the full story behind each pathway in the form of a “mini-review” that explains the background and the significance of the pathway, and provides information about how the involved enzymes and compounds within the pathway function (enzymes also contain mini-reviews). Taken together, the mini-reviews present in MetaCyc would fill 6,300 textbook pages!
Every element in a pathway diagram is designed such that when a user clicks on an enzyme or reaction on the screen, an information page appears about that component. MetaCyc presents the entire interconnected biochemical system in a very user-friendly way. Researchers get more context and detail that goes beyond just diagrams and reactions. They can better understand the biological purpose of a pathway, and not just the name of a component.
One of the contributions of MetaCyc is to integrate the fragmented, sometimes inaccessible, data from the published scientific literature. It can take weeks to pull together different papers written by different groups about the same pathway, written at different times – sometimes in different decades! In the process of adding information to the database, MetaCyc curators often reconcile seemingly disparate pathways that involved the same compounds called by different names by different research teams—we may find 10 different papers for one given pathway where those papers use five different names for a chemical compound in the pathway. Part of what curators do is figure out when scientists are talking about the same compounds. Curators reconcile disagreements among different publications and sometimes resolve errors in the literature.
All of this information is combined to create a story and a pathway. Extensive curation allows MetaCyc to provide the “big picture” of the biochemical factories of life. And it saves scientists huge amounts of time in finding, reading, and reconciling the information from these many publications.