Eleven Free Cheminformatics Scripting Environments

November 14, 2006

A recent question on Yahoo's chemoinf forum got me thinking about free cheminformatics scripting environments. If you've ever wanted to learn an object-oriented scripting language such as Ruby, Python, Perl, or Groovy in the context of cheminformatics, there are many good options to choose from. Few experiences expand a programmer's horizons more than learning one of these freedom languages. This is especially true for developers who, like myself, come from a background involving the safety languages C++ and Java.

Below is a complete roundup of Open Source cheminformatics scripting environments, grouped by language. If closed, commercial offerings were included, this list would, of course, be longer. In the interest of full disclosure, I am the author of RCDK and have worked on OBRuby.

  1. Ruby Chemistry Development Kit (RCDK)- IUPAC nomenclature translation, 2-D structure layout, 2-D color rendering. RCDK combines the capabilities of three Open Source Java toolkits with the agility of the Ruby platform, all in an easy-to-install package. Parse IUPAC nomenclature. Create 2-D coordinates for SMILES strings and IUPAC names. Render anti-aliased color 2-D molecular images in SVG, PNG, and JPG format.
  2. Ruby/Open Babel: OBRuby- A recent addition to the growing family of alternative programming interfaces offered by the C++ toolkit Open Babel. Interconvert several molecular languages including SMILES, molfile, CML, PDB, and InChi. Perform sophisticated molecular queries with SMARTS pattern matching.
  3. Chemruby Rubyforge Site - A pure Ruby toolkit with portions written in C to speed performance. Although I successfully installed Chemruby on my system, I can't use it due to a failed dependency on a library called "dbm".
  4. Molruby - Parse SDFiles in Ruby or on the command line. Molruby is clearly a project in it's early stages. On the other hand, if you're interested in learning Ruby, Molruby's small size may be suited to getting familiar with key concepts.
  5. PyDaylight - A "Pythonic", "thick" interface to the popular Daylight toolkit. The author, Andrew Dalke has done a great deal to promote the idea of applying scripting languages to cheminformatics. Unfortunately, Daylight's toolkit isn't yet offered under an Open Source license, making it difficult for me to evaluate the PyDaylight interface.
  6. Python/Open Babel - Access a good chunk of the impressive Open Babel API through Python. I needed to perform a a small modification to get OBPython working on my system. After that this package worked exactly as advertised.
  7. Python/CDK - Use Jython to access the complete CDK API using Python. Jython is a Java implementation of the Python interpreter, and so this use of the CDK lets developers combine their favorite Java and Python software.
  8. FROWNS (Python) - Loosely based on the PyDaylight API by Andrew Dalke. Read and write SMILES and Molfiles. Perform SMARTS queries, work with fingerprints and enumerate molecular cycles. With optional GraphVis support, render 2-D molecular images.
  9. Perl/Open Babel - Use Open Babel from Perl. I was unsuccessful in building OBPerl on my system; your mileage may vary.
  10. Perlmol- Read and write a number of common formats including SMILES, molfile, SLN, and PDB. Query by molecular and reaction pattern. Installation on my system went smoothly. One of the best-documented projects on this list.
  11. Groovy/CDK - Groovy is a relatively new object-oriented scripting language for Java. I found no Internet references on using Groovy with CDK in English, although it should be simple to do. If you read Japanese, try this link. Stay tuned for more on this interesting combination.