Open Source and Open Data: Why We Should Eat Our Own Dogfood

January 03, 2007

The National Institutes of Health (NIH) has decided to use OpenEye Scientific Software's cheminformatics toolkits to provide key infrastructure for PubChem, a database of small organic molecules containing chemical structure and biological activities information. PubChem is being developed by the National Center for Biotechnology Information (NCBI) as part of the Molecular Libraries and Imaging component of the NIH Roadmap Initiative. "I am excited to see our software built into PubChem," says Roger Sayle OpenEye's Vice President of Bioinformatics. "It's gratifying that our software will be part of such a useful public resource."


Along with the recent decision by the Research Collaboratory for Structural Bioinformatics (RCSB) to use OpenEye's cheminformatics toolkits to curate and depict the Protein Data Bank (PDB) ligand dictionary, the NIH's decision is a clear indication of the speed and robustness of OpenEye's technology for large and diverse sets of chemical structures and data. "It has been beneficial working with the NCBI for their project," says Sayle. "Their data includes enough unusual chemistry to make it a nice validation of the software beyond our regular test sets."

OpenEye Scientific Software Press Release - October 12, 2004

Why did PubChem, the granddaddy of all open chemistry databases, choose a closed, proprietary toolkit for its software infrastructure? A recent Depth-First article highlighted twelve free chemistry databases. Of those for which information is available, many have chosen the same path as PubChem. Why is this?

A huge opportunity is being wasted every time this happens. We, the authors of Open Source software packages, could be working with the architects of Open Data systems to solve their problems in ways that vendors of closed systems can't. We could be using these Open Data systems as real-world proving grounds for our software, fixing bugs that would have never been detected otherwise, and pushing our systems to the limit. We could be identifying new and exciting uses for our software as the organizations we work with repeatedly ask "what if." Sadly, none of this is happening. A great deal more needs to be done by the Open Source community to persuade the Open Data community to at least try their software. The worst that can happen is that we begin to understand the appeal of closed, proprietary products.

One bright spot is NMRShiftDB, which uses the Open Source Chemistry Development Kit for its infrastructure. This is a fine example of Open Source software powering an Open Data source in chemistry. More examples of this kind of Open Source/Open Data symbiosis would go a long way toward making the case.

Eating your own dogfood is an effective way to break into new markets and develop truly competitive products. After all, if those with closely-aligned goals won't use what you have to offer, who else will?