Biowarehouse: Relational Integration of Eleven Bioinformatics Databases and Formats


Karp, P.D., Lee, T.J., Wagner, V. (2008). BioWarehouse: Relational Integration of Eleven Bioinformatics Databases and Formats. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds) Data Integration in the Life Sciences. DILS 2008. Lecture Notes in Computer Science(), vol 5109. Springer, Berlin, Heidelberg.


BioWarehouse is an open-source project for integrating bioinformatics databases within a relational database warehouse. It has two key features. A comprehensive database schema models many different bioinformatics datatypes. A set of loader tools permits loading of public bioinformatics databases, and of standard bioinformatics formats, into that database schema. Thus, multiple databases can be queried together within a single common schema. The supported databases are BioCyc, CMR, ENZYME, Eco2DBase, Genbank, Gene Ontology, KEGG, NCBI Taxonomy, and UniProt. The supported formats are BioPAX (protein interactions subset only) and MAGE-ML.

