Structure Diagram Generation

April 11, 2007

Given a molecule with no 2D coordinates, how would you render a human-readable view? This problem arises in many situations, but most commonly in the context of interpreting line notations such as IUPAC nomenclature, SMILES, or InChI. Whatever the solution you come up with, you'll come face-to-face with the structure diagram generation (SDG) problem.

Generating 2D molecular coordinates is a fundamental (and remarkably difficult) problem in cheminformatics. Discussions in the primary literature date back to at least the 1970s with Chemical Abstract Service's pioneering large-scale efforts. A recent article from Chemical Computing Group (CCG) described the design and implementation of an advanced SDG system. To my knowledge, the only open source implementation of an SDG system is found in the Chemistry Development Kit, and by extension Ruby CDK.

The SDG problem plays an important role in the aesthetics of chemical structure diagrams, as mentioned by two readers. To render a molecule aesthetically, 2D coordinates must minimize confusing atom overlaps, unconventional orientations, and unusual bond angles.

The role of SDG in cheminformatics can only continue to increase in importance, especially as more and more structures are automatically generated through mining the primary literature, the Internet, old PDFs, and other sources. With all of these new computer-generated structures will come the need to make them readily understandable to a chemist through SDG.