Five Open Tools for 2D Structure Layout (aka Structure Diagram Generation) 9
Given a molecular representation without 2D coordinates, how would you display a human-readable view?
This problem can arise in many situations, one of the most common of which is the parsing of line notations such as IUPAC nomenclature, SMILES, or InChI.
And then there are the cases when you have 2D coordinates, but they're not very aesthetically pleasing. Maybe the coordinates were created by people either in a hurry or working with low quality editors, or maybe they were generated as distorted 2D projections of 3D coordinates. Whatever the reason, simply having 2D coordinates may not be the same as having good 2D coordinates.
Last year, a Depth-First article discussed the Structure Diagram Generation (SDG) problem and how it can be solved with Open Source software. Given that nearly a year has passed, it seemed appropriate to revisit the topic.
The good news is that there are at least four independent Open Source implementations of SDG algorithms, and one potential open database approach. They are, in no particular order:
MCDL Written in Java, the emphasis of this software appears to be facilitating the use of Modular Chemical Descriptor Language. Unfortunately, no new releases of this intriguing software package have been made in the last year.
Chemistry Development Kit (CDK) This useful package handles about 70-80% of a typical assortment of chemical structures well. The large amount of activity on the CDK project in general makes this a particularly good SDG system to contribute to, especially in the areas of refactoring and handling special cases. See also Christoph Steinbeck's overview of CDK's layout system.
BKChem A 2D structure editor written in Python. Give it an InChI and it will display the structure, courtesy of SDG. The system worked remarkably well with the molecules I tested. BKChem has also been reported to work in batch mode.
RDKit Written in Python and C++, this package is the newest of the bunch. Although I haven't had much luck compiling RDKit, it still looks quite promising. Any chance of switching to make as a build system?
PubChem PubChem? Maybe. With a database of small molecules now numbering well over ten million, there's a good chance that the molecule for which you need to assign coordinates is already in PubChem. And if it's in PubChem, 2D coordinates have already been assigned. Use an InChI as a hash key, and voila - instant SDG without much software. Given the novelty of large, publicly-available databases of small molecules such as PubChem, this approach may have a great deal of untapped potential.
SDG is one of those issues that can stay off the radar for some only to become an instant, nagging problem with no clear way out. The tools cited here offer an excellent place to begin working toward a comprehensive solution.
Testing Automatic Chemical Structure Recognition with OSRA 10
Countless chemical structures exist only in a raster image format such as JPG, GIF, BMP, or on a printed page or PDF. While perfectly readable to humans, they are very difficult for machines to read. Given the sheer number of these structures that have been produced over the last few decades, the only hope of excavating them from their current data tombs is with computer recognition of some kind. This article discusses OSRA, an open source software package designed to do for chemical structures what Optical Character Recognition did for the printed word.
An online version of OSRA was used to read PNG images of chemical structures produced by an application based on ChemWriter. Both aliased and antialiased images were used and atom coloring was disabled:


Structure interpretation failed for the antialiased image at both 300 and 72 DPI resolution. This was the SMILES that was produced at 72 DPI; the one produced at the 300 DPI setting was not much more encouraging.
However, using the aliased imaged at 72 DPI produced the correct structure.
Could the failure to recognize the antialiased image be due to a problem with the ChemWriter application's rasterization method? Apparently, not. When a screen capture utility was used to produce the image from the ChemWriter application window, the wrong structure was again produced. Here, the PNG encoding was not through a Java program but rather the underlying operating system (Linux) using a standard screen capture utility.
To test the idea that line thickness might play a role in determining the quality of OSRA's interpretation, the antialiased image below was submitted:

Still, the incorrect structure was produced.
Apparently, images of 2D structures in which antialiasing has been applied cause difficulties for OSRA.
Fortunately, the ChemWriter-based application embedded the full connection table of the molecule into all of its images as metadata, so an optical recognition step is unnecessary.
Provided that no antialiasing has been applied to images, OSRA would seem to be a capable tool for converting rasterized 2D chemical structures into machine-readable format.
Image Credit: jspad
The Chemically-Enabled User Interface: An Introduction to Leafcutter
ChemWriter is a 2D chemical structure editor for the Web. Because it's written in Java and uses both the Swing and Java2D APIs, ChemWriter could be plugged into a variety of chemically-enabled user interfaces deployed within a browser, on the desktop, or in other contexts. The availability of this kind of developer tool would open the door to a large new area of interactive cheminformatics applications. This article, the first in a series, introduces Leafcutter, a new product designed to make this possible.
About the Software
Leafcutter is a framework consisting of reusable Swing components and supporting libraries for building chemically-enabled user interfaces. Based on ChemWriter, Leafcutter will contain most of the functionality of the 2D structure editor, but packaged as a set of highly customizable components. Whereas ChemWriter consists of configurable but finished applets for editing and rendering, Leafcutter will consist of components that can be used to build entirely new applets, desktop applications and other Rich Clients.
An alpha-stage developer preview is now available by request from Metamolecular. The package contains API documentation and source code for a sample Swing application (shown below).

The design constraints for tools used to build custom chemically-enabled user interfaces can vary significantly, but fine-grained control over appearance and behavior are top considerations. Depending on the specific use, controlling deployment footprint can also be critical. Leafcutter's design and implementation will address these needs uniquely.
What's New Here?
Although Leafcutter can be used to build traditional Cheminformatics applications, its main purpose will be to enable new kinds of applications that speak the language of 2D chemical structures natively.
Many of today's cheminformatics applications accept 2D chemical structures as input and render the same as output. But they're not generally designed to combine 2D chemical structures with their associated information in an interactive way.
For example, consider "Retro," a hypothetical application that enables Curt, a synthetic chemist, to plan his next synthetic route. Curt would draw his target molecule into a ChemWriter-like editor, as is typical for most reaction databases. But unlike other applications, Retro would interactively give Curt information about possible synthetic routes.
Clicking on a bond displays a side panel summarizing the number of published synthetic procedures that might be applicable. Clicking on the "Accept" button makes the bond disconnection and records the procedure hitset for later retrieval. Clicking on a "Suggest" button, highlights bonds representing viable disconnections, some of which might not have occurred to Curt otherwise.
Most synthetic organic chemistry databases are designed to be maps; Retro is designed to be a GPS device. A recent talk at the San Diego section of the ACS by Jun Xu offered some useful insight into the difference between these two approaches.
In addition, Curt communicates with Retro in his native language, the language of 2D chemical structures, by drawing, pointing and listening. It's the same way he communicates with his colleagues about chemistry.
The same concept could be applied to areas as diverse NMR and IR spectrum assignment and query, mass spectrometry, analyte detection, molecular mechanics calculations, and teaching reaction mechanisms.
Conclusions
If you plan to develop custom user interfaces that draw or manipulate 2D chemical structures, regardless of their design, Leafcutter will provide a powerful new tool for doing so.
Image Credit: Gavatron
Filthy Rich Clients 2

If you wind the clock back enough years, the world of graphical user interfaces was ruled by standardized look-and-feel specifications. This approach was taken in an effort to centralize all of the GUI coding in applications, make it easy to document the application (everyone knows what a slider does, therefore it doesn't need to be described), and work around the relatively poor graphics performance of desktop computers.
But the last decade's collision between the computer industry and the consumer has led to a huge increase in the emphasis on aesthetics in user interfaces: for everything from brand awareness to increasing the comprehensibility of sophisticated systems, to eye-catching coolness to draw the customer in, to just plain "Wow!" ... Aesthetics are in.
-James Gosling, Forward to Filthy Rich Clients
The "destandardization" of the GUI has been underway for several years. From Web applications like Picnik to Flash video players to Apple's iTunes application, users are getting increasingly used to the idea that not every program needs to look like Microsoft Office, and that some of them never should have in the first place.
Now, it's possible to infuse Java Swing applications with the look and feel of this new breed of GUI. The new book Filthy Rich Clients shows how. Covering topics ranging from threading to animation to compositing, this well-written book is a goldmine for anyone wanting to break out of a GUI rut.
The use of reflections, animation, fading and the like in serious applications may seem frivolous. But used in the proper context, these effects can add a great deal to usability and appeal.
Today's frivolous use of memory and CPU cycles has a strange way of becoming next year's must-have feature.
ChemWriter Now Available for Download
A 2D chemical structure editor is a key component in most cheminformatics systems. With an ever-increasing number of groups using the Web as a cheminformatics platform, the need for a structure editor built specifically around the capabilities and constraints of the Web becomes more apparent.
For the last several months, my company (Metamolecular, LLC) has been developing a 2D structure editor called ChemWriter(TM). It was created specifically to solve the problem of building interactive, chemically-enabled Web applications that look good and load fast.
You can now download a free, fully-functional, non-expiring copy of ChemWriter (the ChemWriter Starter Package) good for development and testing of your chemically-aware Web application. The Metamolecular Company Blog has the details.

