Five Open Tools for 2D Structure Layout (aka Structure Diagram Generation) 9
Given a molecular representation without 2D coordinates, how would you display a human-readable view?
This problem can arise in many situations, one of the most common of which is the parsing of line notations such as IUPAC nomenclature, SMILES, or InChI.
And then there are the cases when you have 2D coordinates, but they're not very aesthetically pleasing. Maybe the coordinates were created by people either in a hurry or working with low quality editors, or maybe they were generated as distorted 2D projections of 3D coordinates. Whatever the reason, simply having 2D coordinates may not be the same as having good 2D coordinates.
Last year, a Depth-First article discussed the Structure Diagram Generation (SDG) problem and how it can be solved with Open Source software. Given that nearly a year has passed, it seemed appropriate to revisit the topic.
The good news is that there are at least four independent Open Source implementations of SDG algorithms, and one potential open database approach. They are, in no particular order:
MCDL Written in Java, the emphasis of this software appears to be facilitating the use of Modular Chemical Descriptor Language. Unfortunately, no new releases of this intriguing software package have been made in the last year.
Chemistry Development Kit (CDK) This useful package handles about 70-80% of a typical assortment of chemical structures well. The large amount of activity on the CDK project in general makes this a particularly good SDG system to contribute to, especially in the areas of refactoring and handling special cases. See also Christoph Steinbeck's overview of CDK's layout system.
BKChem A 2D structure editor written in Python. Give it an InChI and it will display the structure, courtesy of SDG. The system worked remarkably well with the molecules I tested. BKChem has also been reported to work in batch mode.
RDKit Written in Python and C++, this package is the newest of the bunch. Although I haven't had much luck compiling RDKit, it still looks quite promising. Any chance of switching to make as a build system?
PubChem PubChem? Maybe. With a database of small molecules now numbering well over ten million, there's a good chance that the molecule for which you need to assign coordinates is already in PubChem. And if it's in PubChem, 2D coordinates have already been assigned. Use an InChI as a hash key, and voila - instant SDG without much software. Given the novelty of large, publicly-available databases of small molecules such as PubChem, this approach may have a great deal of untapped potential.
SDG is one of those issues that can stay off the radar for some only to become an instant, nagging problem with no clear way out. The tools cited here offer an excellent place to begin working toward a comprehensive solution.
Never Draw the Same Molecule Twice: Writing PNG Image Metadata with Python
A recent D-F article discussed a method for encoding machine-readable molecular structure information as image metadata. This article generated some interest among developers. For example, Noel O'Boyle posted code for reading PNG image metadata with Python. The popularity of Python in cheminformatics makes this approach especially attractive.
But how would you write PNG image metadata with Python? The obvious answer of using Image.info followed by Image.write doesn't appear to work. Given my limited knowledge of Python, the answer must come from elsewhere.
Fortunately, Nick Galbreath wrote in to offer a solution. Using Python, PIL, and an undocumented class, Nick has developed a small wrapper function that writes metadata for PNG images. In fact, Nick is fast on his way to becoming a PNG metadata expert, if reluctantly so. His blog is worth checking out and contains several useful techniques for image manipulation.
Eleven Free Cheminformatics Scripting Environments
A recent question on Yahoo's chemoinf forum got me thinking about free cheminformatics scripting environments. If you've ever wanted to learn an object-oriented scripting language such as Ruby, Python, Perl, or Groovy in the context of cheminformatics, there are many good options to choose from. Few experiences expand a programmer's horizons more than learning one of these freedom languages. This is especially true for developers who, like myself, come from a background involving the safety languages C++ and Java.
Below is a complete roundup of Open Source cheminformatics scripting environments, grouped by language. If closed, commercial offerings were included, this list would, of course, be longer. In the interest of full disclosure, I am the author of RCDK and have worked on OBRuby.
Ruby Chemistry Development Kit (RCDK)- IUPAC nomenclature translation, 2-D structure layout, 2-D color rendering. RCDK combines the capabilities of three Open Source Java toolkits with the agility of the Ruby platform, all in an easy-to-install package. Parse IUPAC nomenclature. Create 2-D coordinates for SMILES strings and IUPAC names. Render anti-aliased color 2-D molecular images in SVG, PNG, and JPG format.Ruby/Open Babel: OBRuby- A recent addition to the growing family of alternative programming interfaces offered by the C++ toolkit Open Babel. Interconvert several molecular languages including SMILES, molfile, CML, PDB, and InChi. Perform sophisticated molecular queries with SMARTS pattern matching.
Chemruby Rubyforge Site - A pure Ruby toolkit with portions written in C to speed performance. Although I successfully installed Chemruby on my system, I can't use it due to a failed dependency on a library called "dbm".
Molruby - Parse SDFiles in Ruby or on the command line. Molruby is clearly a project in it's early stages. On the other hand, if you're interested in learning Ruby, Molruby's small size may be suited to getting familiar with key concepts.
PyDaylight - A "Pythonic", "thick" interface to the popular Daylight toolkit. The author, Andrew Dalke has done a great deal to promote the idea of applying scripting languages to cheminformatics. Unfortunately, Daylight's toolkit isn't yet offered under an Open Source license, making it difficult for me to evaluate the PyDaylight interface.Python/Open Babel - Access a good chunk of the impressive Open Babel API through Python. I needed to perform a a small modification to get OBPython working on my system. After that this package worked exactly as advertised.
Python/CDK - Use Jython to access the complete CDK API using Python. Jython is a Java implementation of the Python interpreter, and so this use of the CDK lets developers combine their favorite Java and Python software.
FROWNS (Python) - Loosely based on the PyDaylight API by Andrew Dalke. Read and write SMILES and Molfiles. Perform SMARTS queries, work with fingerprints and enumerate molecular cycles. With optional GraphVis support, render 2-D molecular images.
Perl/Open Babel - Use Open Babel from Perl. I was unsuccessful in building OBPerl on my system; your mileage may vary.Perlmol- Read and write a number of common formats including SMILES, molfile, SLN, and PDB. Query by molecular and reaction pattern. Installation on my system went smoothly. One of the best-documented projects on this list.
Groovy/CDK - Groovy is a relatively new object-oriented scripting language for Java. I found no Internet references on using Groovy with CDK in English, although it should be simple to do. If you read Japanese, try this link. Stay tuned for more on this interesting combination.

