The Art and Science of Chemical Structure Diagrams: Double Trouble 6
Two-dimensional chemical structure diagrams are part of a language with both grammar and aesthetics. Both aspects play a role in determining scientific usability, and both deserve careful consideration when designing cheminformatics systems. This article, the first of a series, will discuss one aspect of 2D chemical structure diagrams that is only sparsely documented and yet instantly recognizable when done wrong: the layout of double bonds.
Consider one of the most common substructures in chemistry, the benzene ring. The two depictions below are grammatically both correct, yet one of them is incorrect in that it fails to follow an important aesthetic convention.

The structure on the left is the one most chemists would instantly recognize as the standard depiction. All carbon vertices are connected with a line path and all double bonds are offset from that line. Even more important, the double bonds are offset toward the center of the ring and shortened on either end.
The structure on the right is clearly irregular; one of the double bonds is outside of the ring, but the others are inside. Seeing such a structure is not just distracting - it's departure from the standard depiction style can lead reactions ranging from mild irritation to doubt over what exactly is being communicated.
Creating the correct representation for benzene's double bonds introduces a number of complexities. Before any implementation can be tried, some key questions about the placement of the second double bond line need to be answered:
How will the side on which the second line appears be determined?
By what distance will the second line be offset?
By how much will the second line be shortened on each side?
To avoid the complexities introduced by these questions (and their numerous edge cases), some tools eliminate the problem by representing all double bonds using the same pattern:

This approach, while solving a developer problem, creates a user problem in that the resulting structure is much more visually demanding. Notice how carbon verticies no longer are connected by a continuous line. This results in a structure whose carbon backbone is much more tedious to trace. The problem is compounded when fused rings and substituents are added to the mix:

The questions posed above may not seem that hard. On the surface, they're not. What makes the problem hard are the edge cases that most chemists are aware of, but which are quite difficult to reduce to working software. These edge cases can crop up in the most unexpected places.
Take tetralin for example:

The structure on the left lays out the double bond properly (grouped with benzene substructure), while the structure on the right does not.
So in addition to some form of ring perception, software needs to recognize that the second double bond line goes "inside" an aromatic ring.
Here's another example, in which the aromatic ring contains five atoms:

Recently, the rendering capability of my company's chemical structure editor, ChemWriter, was upgraded to address a similar issue:

The previous version of ChemWriter used a ring perception algorithm that was misled when certain kinds of tetrasubstituted bonds were located within rings, like the one shown above. The most recent version, 1.1.2, solves the problem by using a more robust (and efficient) ring perception algorithm. You can download a free ChemWriter Starter Package containing the upgrade from Metamolecular, or test it directly online.
Double bond rendering is a surprisingly deep problem raising a number of issues: ring perception, aromaticity detection, vector graphic manipulation, and numerous 2D geometry topics, to name a few. But double bond placement is just one of many issues to address when rendering aesthetically-pleasing and chemically-correct 2D chemical structures. Future articles will discuss some of them.
Image Credit: mutbka
The Paper Laboratory Notebook: Chemistry's Most Ancient Data Tomb 1
Derek Lowe's In the Pipeline hosts an interesting discussion on Electronic Laboratory Notebooks (ELNs). The wasteful process of entombing valuable scientific data often begins with the paper lab notebook, so the subject of ELNs should be of great interest to anyone involved in creating, using, or reprocessing chemical information.
Why do paper notebooks continue to persist in chemistry?
The issue is complex, but in my view stems from the lack of a truly usable and affordable tool. Although the term "tool" may suggest software, it actually involves a much more complex beast consisting of hardware, software, an ergonomic hardware/software user interface, and a computer network. In chemistry, the problem is compounded by the centrality of chemical structures and the inability of most generic ELN products to capture or use them.
Given these constraints, and the costs associated with creating and marketing general-purpose products designed to work within them, it's not surprising that many organizations decide to roll their own ELN. And it's even less surprising that many others decide sticking with paper is a better option - at least for now.
Image Credit: John Thurm
Testing Automatic Chemical Structure Recognition with OSRA 10
Countless chemical structures exist only in a raster image format such as JPG, GIF, BMP, or on a printed page or PDF. While perfectly readable to humans, they are very difficult for machines to read. Given the sheer number of these structures that have been produced over the last few decades, the only hope of excavating them from their current data tombs is with computer recognition of some kind. This article discusses OSRA, an open source software package designed to do for chemical structures what Optical Character Recognition did for the printed word.
An online version of OSRA was used to read PNG images of chemical structures produced by an application based on ChemWriter. Both aliased and antialiased images were used and atom coloring was disabled:


Structure interpretation failed for the antialiased image at both 300 and 72 DPI resolution. This was the SMILES that was produced at 72 DPI; the one produced at the 300 DPI setting was not much more encouraging.
However, using the aliased imaged at 72 DPI produced the correct structure.
Could the failure to recognize the antialiased image be due to a problem with the ChemWriter application's rasterization method? Apparently, not. When a screen capture utility was used to produce the image from the ChemWriter application window, the wrong structure was again produced. Here, the PNG encoding was not through a Java program but rather the underlying operating system (Linux) using a standard screen capture utility.
To test the idea that line thickness might play a role in determining the quality of OSRA's interpretation, the antialiased image below was submitted:

Still, the incorrect structure was produced.
Apparently, images of 2D structures in which antialiasing has been applied cause difficulties for OSRA.
Fortunately, the ChemWriter-based application embedded the full connection table of the molecule into all of its images as metadata, so an optical recognition step is unnecessary.
Provided that no antialiasing has been applied to images, OSRA would seem to be a capable tool for converting rasterized 2D chemical structures into machine-readable format.
Image Credit: jspad


