Roll Your Own Chemical Database With Free Components 5
Are you thinking of building a free chemical database but would rather not rent and maintain a bunch of proprietary software components? Norbert Haider has thought a lot about this problem and offers some helpful resources to get you started:
Creating a web-based, searchable molecular structure database using free software Step-by step case study
How to create a web-based molecular structure database with free software A presentation
checkmol/matchmol Open source command-line utility for 2D (sub)structure matching
mol2ps Command-line utility for converting molfiles into Postscript files
Haider's system can be deployed on commodity hardware running open source operating systems. In other words, the cost of setting up a system like the one he describes is practically zero.
Creating and open sourcing your own custom components is one way to go. Building on top of existing open source tools like CDK, Open Babel, Octet and JOELib is another.
Haider's work raises an interesting question. Has anyone assembled a complete, ready to install general purpose chemical database package built from open source components? It for no other reason, such an exercise would give an excellent idea of what the dogfood tastes like.
Making the Case: Similarity by Compression
...The structures were converted to SMILES format and canonicalized using a program written with the open-source Java cheminformatics library JOELib2. ... To conclude, we have demonstrated that SMILES strings and compression programs are a simple, yet powerful method for similarity searching, competitive with state-of-the-art-techniques. The Ruby scripts used to carry out the experiments described in this paper are available for download from http://comp.chem.nottingham.ac.uk/download/zippity/.
James Melville, Jenna Riley, and Johathan Hirst, J. Chem Inf. Model.
Yet another appearance of Open Source software in the literature comes by way of a paper from Melville, Riley, and Hirst. This work takes advantage of the alphabet-like nature of SMILES strings and widely-available compression algorithms to perform molecular similarity analyses. Not only does this work use the Open Source JOELib library but the authors have made the Ruby scripts that perform the similarity analysis freely available under the same terms as Ruby (Ruby's license or the GPL).
The times they are a-changein'.


