Five Open Tools for 2D Structure Layout (aka Structure Diagram Generation) 9
Given a molecular representation without 2D coordinates, how would you display a human-readable view?
This problem can arise in many situations, one of the most common of which is the parsing of line notations such as IUPAC nomenclature, SMILES, or InChI.
And then there are the cases when you have 2D coordinates, but they're not very aesthetically pleasing. Maybe the coordinates were created by people either in a hurry or working with low quality editors, or maybe they were generated as distorted 2D projections of 3D coordinates. Whatever the reason, simply having 2D coordinates may not be the same as having good 2D coordinates.
Last year, a Depth-First article discussed the Structure Diagram Generation (SDG) problem and how it can be solved with Open Source software. Given that nearly a year has passed, it seemed appropriate to revisit the topic.
The good news is that there are at least four independent Open Source implementations of SDG algorithms, and one potential open database approach. They are, in no particular order:
MCDL Written in Java, the emphasis of this software appears to be facilitating the use of Modular Chemical Descriptor Language. Unfortunately, no new releases of this intriguing software package have been made in the last year.
Chemistry Development Kit (CDK) This useful package handles about 70-80% of a typical assortment of chemical structures well. The large amount of activity on the CDK project in general makes this a particularly good SDG system to contribute to, especially in the areas of refactoring and handling special cases. See also Christoph Steinbeck's overview of CDK's layout system.
BKChem A 2D structure editor written in Python. Give it an InChI and it will display the structure, courtesy of SDG. The system worked remarkably well with the molecules I tested. BKChem has also been reported to work in batch mode.
RDKit Written in Python and C++, this package is the newest of the bunch. Although I haven't had much luck compiling RDKit, it still looks quite promising. Any chance of switching to make as a build system?
PubChem PubChem? Maybe. With a database of small molecules now numbering well over ten million, there's a good chance that the molecule for which you need to assign coordinates is already in PubChem. And if it's in PubChem, 2D coordinates have already been assigned. Use an InChI as a hash key, and voila - instant SDG without much software. Given the novelty of large, publicly-available databases of small molecules such as PubChem, this approach may have a great deal of untapped potential.
SDG is one of those issues that can stay off the radar for some only to become an instant, nagging problem with no clear way out. The tools cited here offer an excellent place to begin working toward a comprehensive solution.
Ruby CDK One-Liners: Create a Molfile With 2D Atom Coordinates From Arbitrary SMILES Strings
A very common operation in cheminformatics is the interconversion of molfiles and SMILES strings. Usually, converting from SMILES gives a molfile in which all atoms have coordinates of (0,0,0). Sometimes you just need more than that. The following Ruby CDK code will accept an arbitrary SMILES string and return a molfile with fully-assigned 2D atom coordinates:
require 'rubygems'
require 'rcdk'
require 'rcdk/util'
include RCDK::Util
XY.coordinate_molfile Lang.smiles_to_molfile('c1ccccc1')Looking at it this way, those four lines of require/include statements seem pretty darn verbose.
From InChI to Image with Ruby Open Babel and Ruby CDK 2
Like SMILES, InChI is a line notation that can be used to encode and store chemical information relatively efficiently. Although there are a number of scenarios where this strategy is used, what many of them have in common is the need to eventually convert an InChI into a human-readable form. In most cases, this form will be a 2D chemical structure. This article will show how a small Ruby library can convert InChI strings into color PNG images with the help of Ruby Open Babel and Ruby CDK.
The Library
Our library accepts an InChI as input and produces a scaled PNG image as output. It re-uses part of a previously-discussed library for the interconversion of SMILES and InChI.
require 'rubygems'
require 'openbabel'
require_gem 'rcdk'
require 'rcdk/util'
module InChI
@@to_smiles = OpenBabel::OBConversion.new
@@to_smiles.set_in_and_out_formats 'inchi', 'smi'
def inchi_to_png inchi, path_to_png, width, height
smiles = inchi_to_smiles inchi
RCDK::Util::Image.smiles_to_png smiles, path_to_png, width, height
end
private
def inchi_to_smiles inchi
mol = OpenBabel::OBMol.new
@@to_smiles.read_string(mol, inchi) or raise "Can't parse InChI: #{inchi}."
@@to_smiles.write_string(mol).strip
end
endTesting
Our library can be tested by saving it to a file called inchi.rb and using interactive Ruby (the warning can safely be ignored for now):$ irb irb(main):001:0> require 'inchi' ./inchi.rb:3:Warning: require_gem is obsolete. Use gem instead. /usr/local/lib/ruby/gems/1.8/gems/rcdk-0.3.0/lib/rcdk/java.rb:26:Warning: require_gem is obsolete. Use gem instead. i=> true irb(main):002:0> include InChI => Object irb(main):003:0> inchi='InChI=1/C23H27FN4O2/c1-15-18(23(29)28-10-3-2-4-21(28)25-15)9-13-27-11-7-16(8-12-27)22-19-6-5-17(24)14-20(19)30-26-22/h5-6,14,16H,2-4,7-13H2,1H3' #risperidone => "InChI=1/C23H27FN4O2/c1-15-18(23(29)28-10-3-2-4-21(28)25-15)9-13-27-11-7-16(8-12-27)22-19-6-5-17(24)14-20(19)30-26-22/h5-6,14,16H,2-4,7-13H2,1H3" irb(main):004:0> inchi_to_png inchi, 'risperidone.png', 300, 300 => nil
This code produces the following image:

Our library can also be used on more complicated molecules, for example Brevetoxin:
$ irb irb(main):001:0> require 'inchi' ./inchi.rb:3:Warning: require_gem is obsolete. Use gem instead. /usr/local/lib/ruby/gems/1.8/gems/rcdk-0.3.0/lib/rcdk/java.rb:26:Warning: require_gem is obsolete. Use gem instead. => true irb(main):002:0> include InChI => Object irb(main):003:0> inchi='InChI=1/C49H70O13/c1-26-17-36-39(22-45(52)58-36)57-44-21-38-40(62-48(44,4)23-26)18-28(3)46-35(55-38)11-7-6-10-31-32(59-46)12-8-14-34-33(54-31)13-9-15-43-49(5,61-34)24-42-37(56-43)20-41-47(60-42)30(51)19-29(53-41)16-27(2)25-50/h6-8,14,25-26,28-44,46-47,51H,2,9-13,15-24H2,1,3-5H3/b7-6-,14-8-' #brevetoxin a => "InChI=1/C49H70O13/c1-26-17-36-39(22-45(52)58-36)57-44-21-38-40(62-48(44,4)23-26)18-28(3)46-35(55-38)11-7-6-10-31-32(59-46)12-8-14-34-33(54-31)13-9-15-43-49(5,61-34)24-42-37(56-43)20-41-47(60-42)30(51)19-29(53-41)16-27(2)25-50/h6-8,14,25-26,28-44,46-47,51H,2,9-13,15-24H2,1,3-5H3/b7-6-,14-8-" irb(main):004:0> inchi_to_png inchi, 'brevetoxin.png', 300, 200 => nil
This produces the following image:

Conclusions
While our library could certainly be improved, it solves what otherwise would be a very difficult problem conveniently. Areas for further work include error handling and improving the appearance of the images (the latter is the aim of Firefly). Despite the fact that three programming languages are used (Ruby, C++, and Java), this complexity is neatly encapsulated behind a simple Ruby interface.
Structure Diagram Generation 4
Given a molecule with no 2D coordinates, how would you render a human-readable view? This problem arises in many situations, but most commonly in the context of interpreting line notations such as IUPAC nomenclature, SMILES, or InChI. Whatever the solution you come up with, you'll come face-to-face with the structure diagram generation (SDG) problem.
Generating 2D molecular coordinates is a fundamental (and remarkably difficult) problem in cheminformatics. Discussions in the primary literature date back to at least the 1970s with Chemical Abstract Service's pioneering large-scale efforts. A recent article from Chemical Computing Group (CCG) described the design and implementation of an advanced SDG system. To my knowledge, the only open source implementation of an SDG system is found in the Chemistry Development Kit, and by extension Ruby CDK.
The SDG problem plays an important role in the aesthetics of chemical structure diagrams, as mentioned by two readers. To render a molecule aesthetically, 2D coordinates must minimize confusing atom overlaps, unconventional orientations, and unusual bond angles.
The role of SDG in cheminformatics can only continue to increase in importance, especially as more and more structures are automatically generated through mining the primary literature, the Internet, old PDFs, and other sources. With all of these new computer-generated structures will come the need to make them readily understandable to a chemist through SDG.

