Green M.L., Karp P.D. Using genome context data to identify specific types of functional associations in Pathway/Genome Databases. Bioinformatics, vol. 23, pp. i205-11, 2007.
Hundreds of genes lacking homology to any protein of known function are sequenced every day. Genome-context methods have proved useful in providing clues about functional annotations for many proteins. However, genome-context methods detect many biological types of functional associations, and do not identify which type of functional association they have found.
We have developed two new genome-context-based algorithms. Algorithm 1 extends our previous algorithm for identifying missing enzymes in predicted metabolic pathways (pathway holes) to use genome-context features. The new algorithm has significantly improved scope because it can now be applied to pathway reactions to which sequence similarity methods cannot be applied due to an absence of known sequences for enzymes catalyzing the reaction in other organisms. The new method identifies at least one known enzyme in the top ten hits for 58% of EcoCyc reactions that lack enzyme sequences in other organisms. Surprisingly, the addition of genome-context features does not improve the accuracy of the algorithm when sequences for the enzyme do exist in other organisms. Algorithm 2 uses genome-context methods to predict three distinct types of functional relationships between pairs of proteins: pairs that occur in the same protein complex, the same pathway, or the same operon. This algorithm performs with varying degrees of accuracy on each type of relationship, and performs best in predicting pathway and protein complex relationships.