Five Open Tools for 2D Structure Layout (aka Structure Diagram Generation) 9

Posted by Rich Apodaca Wed, 26 Mar 2008 13:11:00 GMT

Given a molecular representation without 2D coordinates, how would you display a human-readable view?

This problem can arise in many situations, one of the most common of which is the parsing of line notations such as IUPAC nomenclature, SMILES, or InChI.

And then there are the cases when you have 2D coordinates, but they're not very aesthetically pleasing. Maybe the coordinates were created by people either in a hurry or working with low quality editors, or maybe they were generated as distorted 2D projections of 3D coordinates. Whatever the reason, simply having 2D coordinates may not be the same as having good 2D coordinates.

Last year, a Depth-First article discussed the Structure Diagram Generation (SDG) problem and how it can be solved with Open Source software. Given that nearly a year has passed, it seemed appropriate to revisit the topic.

The good news is that there are at least four independent Open Source implementations of SDG algorithms, and one potential open database approach. They are, in no particular order:

  • MCDL Written in Java, the emphasis of this software appears to be facilitating the use of Modular Chemical Descriptor Language. Unfortunately, no new releases of this intriguing software package have been made in the last year.

  • Chemistry Development Kit (CDK) This useful package handles about 70-80% of a typical assortment of chemical structures well. The large amount of activity on the CDK project in general makes this a particularly good SDG system to contribute to, especially in the areas of refactoring and handling special cases. See also Christoph Steinbeck's overview of CDK's layout system.

  • BKChem A 2D structure editor written in Python. Give it an InChI and it will display the structure, courtesy of SDG. The system worked remarkably well with the molecules I tested. BKChem has also been reported to work in batch mode.

  • RDKit Written in Python and C++, this package is the newest of the bunch. Although I haven't had much luck compiling RDKit, it still looks quite promising. Any chance of switching to make as a build system?

  • PubChem PubChem? Maybe. With a database of small molecules now numbering well over ten million, there's a good chance that the molecule for which you need to assign coordinates is already in PubChem. And if it's in PubChem, 2D coordinates have already been assigned. Use an InChI as a hash key, and voila - instant SDG without much software. Given the novelty of large, publicly-available databases of small molecules such as PubChem, this approach may have a great deal of untapped potential.

SDG is one of those issues that can stay off the radar for some only to become an instant, nagging problem with no clear way out. The tools cited here offer an excellent place to begin working toward a comprehensive solution.

Comments

Leave a response

  1. baoilleach Thu, 27 Mar 2008 11:54:45 GMT

    Nice work, and very timely - I'm currently adding SDG to pybel via an interface to one/two of these.

  2. baoilleach Thu, 27 Mar 2008 15:31:59 GMT

    I've just been informed of mol2ps (GPL): http://merian.pch.univie.ac.at/~nhaider/cheminf/mol2ps.html

    It appears to fulfull the requirements, but I haven't tested.

  3. Rich Apodaca Thu, 27 Mar 2008 20:24:31 GMT

    Noel, mol2ps looks quite interesting. It appears to read a molfile, create coordinates, and depict the structure in PostScript - all from a single Pascal source file.

  4. Rich Apodaca Thu, 27 Mar 2008 21:00:33 GMT

    Actually, its not clear that mol2ps creates 2D coordinates. The documentation seems to suggest it uses the coordinates already in a molfile...

  5. Norbert Haider Mon, 31 Mar 2008 05:58:08 GMT

    As the author of mol2ps, I can clarify: mol2ps does not create its own 2D coordinates, but just uses those of the mol file. I used it (e.g.) for batch generation of larger numbers of PNG image files (via Ghostscript) from PubChem SDF files.

  6. Christoph Steinbeck Sun, 06 Apr 2008 19:32:30 GMT

    It would be great to really evaluate CDK SDG's performance on a what you call a "typical assortment". I actually think, the current version of code would do better than your estimate :-) Cheers, Chris

  7. baoilleach Wed, 23 Apr 2008 15:23:14 GMT

    @Christoph: Stay tuned...:-)

    @Rich: Also frowns.sf.net

  8. Rich Apodaca Wed, 23 Apr 2008 15:45:41 GMT

    Noel, ah yes, I forgot about Frowns. It looks like their layout is based on AT&T's GraphViz. Unfortunately, the Frowns site doesn't give any pictures. Do you have any you could share?

  9. baoilleach Wed, 23 Apr 2008 17:28:49 GMT

    'Fraid not. It's been years since I used Frowns, and I never got the visualisation to work (although I didn't try very hard).

Comments