Making the Case: Similarity by Compression

December 13, 2006

...The structures were converted to SMILES format and canonicalized using a program written with the open-source Java cheminformatics library JOELib2. ... To conclude, we have demonstrated that SMILES strings and compression programs are a simple, yet powerful method for similarity searching, competitive with state-of-the-art-techniques. The Ruby scripts used to carry out the experiments described in this paper are available for download from http://comp.chem.nottingham.ac.uk/download/zippity/. James Melville, Jenna Riley, and Johathan Hirst, J. Chem Inf. Model.

Yet another appearance of Open Source software in the literature comes by way of a paper from Melville, Riley, and Hirst. This work takes advantage of the alphabet-like nature of SMILES strings and widely-available compression algorithms to perform molecular similarity analyses. Not only does this work use the Open Source JOELib library but the authors have made the Ruby scripts that perform the similarity analysis freely available under the same terms as Ruby (Ruby's license or the GPL).

The times they are a-changein'.