Making Your 2D Structures Look Good: Firefly, Styles and Stylesheets
Chemists can be very discerning when it comes to chemical structure aesthetics. This is not surprising, given the central role played by 2D chemical structures in the day-to-day work of many chemists. For example, consider the Wikipedia Chemistry/Structure drawing workgroup's ongoing discussion about achieving a consistent look for chemical structures on the online encyclopedia.
Several articles have discussed Firefly, a 2D chemical structure editor specifically designed for the Web. With major work on the rendering engine and structure manipulation interface complete, recent efforts have turned toward exposing drawing settings through a graphical user interface. Here I'll provide some screenshots of an interface prototype along with sample structures. I'll also briefly discuss the larger question of making 2D structure drawing styles portable.
Drawing styles are edited through a tabbed dialog containing a live preview window that uses the current structure or a default structure if none is available. The dialog is resizable, enabling users to immediately see the effects of changes on structures of varying sizes. Although this dialog could be bundled and deployed with the editor, its large footprint makes it more appropriate for use as an optional feature or as a standalone configuration tool in a Web application.

Changes can be rolled back entirely ("Reset"), canceled ("Cancel"), or accepted ("OK").
Let's say we'd like to apply a black background with white bonds, as used in some Power Point presentations:

After applying this change, we decide that we'd rather not use atom coloring:

After looking at this structure for a few seconds, we decide that narrower stereo bonds are needed:

After some experimentation, we find a more appropriate non-stereo bond width and double bond offset:

What about a Serif font? No, I don't think so:

But we could certainly reduce the size of the atom labels:

On second thought, the original atom sizes were fine, although changing font may require us to reconsider the atom label heights:

As you can see, the possibilities for customization are nearly endless. In practice, however, most chemists will adopt only two structure drawing styles that they re-use as needed: one for reports and manuscripts; and one for presentations. It will be interesting to see whether a third style makes it's way into the standard repertoire: Web.
Each chemist will want a way to save their styles, possibly share them, and easily apply them. Although a few systems for doing so are feasible, the most practical approach would be a stylesheet. Applying a stylesheet to any structure diagram would change its appearance, offering a simple mechanism to achieve a consistent look across documents.
Developing a universal (cross-editor) stylesheet system would be no easy task, given the wildly divergent capabilities of 2D structure rendering software. Despite the technical difficulty, the payoff for users is obvious.
Molecular Style Sheets: Combining SVG and CSS
Cascading Style Sheets (CSS) are used by Web developers to modify the appearance of an HTML document without requiring changes to the document itself. This approach has become so popular because of the power it offers: developers can achieve a consistent and re-usable look by simply editing and/or copying a single document.
2-D molecular structures are like text documents in that context determines the best presentation style. For example, the way that a 2-D structure appears on a Web page, complete with atom color coding and anti-aliasing, may not be the best way for it to appear on a handheld device. Consider these use cases:
An online publisher may want to achieve a consistent "look" for their 2-D molecular graphics, regardless of who generated them. For portability, they want to avoid hard-coding the styling information into the software they use.
You want to be able to re-use the 2-D structures you've downloaded from a blog in your presentation. The appearance of these structures needs to match those you already have.
An online substructure query may return results to a user that have been highlighted to indicate the atoms and bonds where the query hit. The user may want to set his or her own highlight color, or disable the feature altogether.
Users of ChemDraw and software like it are probably familiar with its concept of styles. This is the right idea, although limited in practice. The main limitation is that these products are aimed at single users on desktop machines that are willing to do a great deal of manual work to achieve consistency. Something far more general and automated is going to be needed, and to my knowledge it does not yet exist.
Could the style sheet concept be applied to 2-D structure diagrams? It turns out that SVG may offer a solution. Just as the appearance of an HTML document can be styled with CSS, so too can the appearance of an SVG document.
A Demonstration: Highlighting Bonds
As a demonstration, we'll see how a style sheet can be used to highlight one of naphthalene's rings, possibly as a result of it being hit by a substructure search.

Consider the above 2-D structure of naphthalene, which was generated by Structure-CDK, the latest version of which can be downloaded here. The SVG that generated this image is shown below. I have edited the commented lines.
<?xml version="1.0" encoding="UTF-8"?>
<!-- Edit: a stylesheet -->
<?xml-stylesheet href="style_normal.css" type="text/css"?>
<!DOCTYPE svg PUBLIC '-//W3C//DTD SVG 1.0//EN' 'http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd'>
<svg fill-opacity="1" xmlns:xlink="http://www.w3.org/1999/xlink" color-rendering="auto" color-interpolation="auto" text-rendering="auto" stroke="black" stroke-linecap="square" stroke-miterlimit="10" shape-rendering="auto" stroke-opacity="1" fill="black" stroke-dasharray="none" font-weight="normal" stroke-width="1" xmlns="http://www.w3.org/2000/svg" font-family="'Dialog'" font-style="normal" stroke-linejoin="miter" font-size="12" stroke-dashoffset="0" image-rendering="auto">
<defs id="genericDefs" />
<g>
<g text-rendering="optimizeLegibility" stroke-width="0.098" transform="scale(47.2947,47.2947) translate(0.049,3.1127)" stroke-linecap="round" stroke-linejoin="round">
<line y2="-0.7" fill="none" x1="4.8497" x2="4.8497" y1="-2.1" />
<path fill="none" d="M4.8497 -2.1 L3.6373 -2.8 M4.5581 -1.945 L3.6488 -2.47" />
<path fill="none" d="M0 -2.1 L0 -0.7 M0.28 -1.925 L0.28 -0.875" class="hit" /> <!-- Edit: hit -->
<line y2="-2.8" fill="none" x1="0" x2="1.2124" y1="-2.1" class="hit" /> <!-- Edit: hit -->
<path fill="none" d="M4.8497 -0.7 L3.6373 0 M4.5581 -0.855 L3.6488 -0.33" />
<line y2="0" fill="none" x1="0" x2="1.2124" y1="-0.7" class="hit" /> <!-- Edit: hit -->
<line y2="-2.1" fill="none" x1="3.6373" x2="2.4249" y1="-2.8" />
<path fill="none" d="M1.2124 -2.8 L2.4249 -2.1 M1.224 -2.47 L2.1333 -1.945" class="hit" /> <!-- Edit: hit -->
<line y2="-0.7" fill="none" x1="3.6373" x2="2.4249" y1="0" />
<path fill="none" d="M1.2124 0 L2.4249 -0.7 M1.224 -0.33 L2.1333 -0.855" class="hit" /> <!-- Edit: hit -->
<line y2="-0.7" fill="none" x1="2.4249" x2="2.4249" y1="-2.1" class="hit" /> <!-- Edit: hit -->
</g>
</g>
</svg>In the image of naphthalene rendered above, the stylesheet I used was blank. However, by applying the following one-line style sheet, I can significantly change the appearance of this image:
*.hit { stroke: red }This line is known as a "class selector." A CSS-aware SVG renderer (such as Firefox), after loading this style sheet, will apply the red stroke styling to all elements containing the hit class attribute. The output, shown below, highlights one of the rings of naphthalene in red.

Interestingly, the SVG document itself says nothing about what color the hit class should be - that's left for the style sheet to determine. So by changing one line in the style sheet, I can change the appearance of the hit highlighting to purple, green, or aquamarine. And this applies not only to colors, but to line thickness, line shape, anti-aliasing, and a variety of other properties.
Another Demonstration: Global Line Thickness
It's also possible to globally affect the appearance of naphthalene with a simple style sheet. For example, the following style sheet changes the line thickness and all line colors of naphthalene:
* { stroke-width: 0.25; stroke: green; }When the naphthalene SVG is rendered with this style sheet, the image shown below is produced. The "*" selector is a wildcard, applying to all elements in the SVG document. This version of the style sheet ignores our "hit" styling from the example above. The hit styling could easily be added back in by adding the appropriate class selector line to the CSS.

You may notice in the image above that the leftmost vertical line appears clipped on its left side. This is because the SVG output from Structure-CDK exactly aligns the left line border with the leftmost side of the image area. By thickening the lines with a style sheet, we've overrun the image area. This could be fixed by moving the SVG viewport to the left. But that's a subject for another time.
A Limitation
It will probably never be possible to modify the distance between parallel lines, as for example in multiple bonds, with the CSS approach. These distances are set in the coordinate attributes of the line and path elements, and are independent of styling.
Conclusions
Of course, we're just scratching the surface of what's possible with CSS and 2-D molecular structures. For example, the same principles outlined here could be used for atom coloring schemes and a variety of line and drawing properties. Various forms of interactive animation are even possible. Despite its limitations, SVG and CSS make a powerful combination for developing next-generation molecular rendering platforms.
From IUPAC Nomenclature to 2-D Structures With OPSIN
A previous article introduced OPSIN, an Open Source Java library for decoding IUPAC chemical nomenclature. In this tutorial, you'll see how OPSIN can, when interfaced with freely-available chemical informatics software, generate 2-D structure diagrams from IUPAC names.
Prerequisites
This tutorial requires Ruby CDK (RCDK), which in turn requires Ruby, Java, and the Ruby Java Bridge. Tutorials detailing the installation of RCDK on both Windows and Linux platforms are available.
In addition, you'll need a copy of the standalone jarfile opsin-big-0.1.0.jar. Future versions of RCDK will integrate the OPSIN jarfile, making this step unnecessary.
Outlining the Problem and a Solution
We'd like to create a simple Ruby class with a method that accepts an IUPAC chemical name as input and produces a PNG image of the corresponding molecule as output. OPSIN accepts IUPAC names as input, but it only produces Chemical Markup Language (CML) as output. The CML output lacks 2-D coordinates, and OPSIN itself has no 2-D rendering capabilities.
We'll use RCDK to augment OPSIN's capabilities. Thanks to CDK's built-in CML support, RCDK can read CML and generate an AtomContainer representation. RCDK also supports the assignment of 2-D coordinates to an AtomContainer via CDK's StructureDiagramGenerator. To produce the PNG image, we'll use the 2-D rendering capability made possible through Structure-CDK, which is a built-in component of RCDK.
A Simple Ruby Library
Create a working directory and copy opsin-big-0.1.0.jar into it. Next, create a file called depictor.rb containing the following Ruby code:
require 'rubygems'
require_gem 'rcdk'
require 'rcdk'
Java::Classpath.add('opsin-big-0.1.0.jar')
require 'util'
# A simple IUPAC->2-D structure convertor.
class Depictor
@@StringReader = import 'java.io.StringReader'
@@NameToStructure = import 'uk.ac.cam.ch.wwmm.opsin.NameToStructure'
@@CMLReader = import 'org.openscience.cdk.io.CMLReader'
@@ChemFile = import 'org.openscience.cdk.ChemFile'
def initialize
@nts = @@NameToStructure.new
@cml_reader = @@CMLReader.new
end
# Writes a <tt>width</tt> by <tt>height</tt> PNG to
# <tt>filename</tt> for the molecule described by
# <tt>iupac_name</tt>.
def depict_png(iupac_name, filename, width, height)
cml = @nts.parseToCML(iupac_name)
throw("Can't parse name: #{iupac_name}") unless cml
molfile = cml_to_molfile(cml)
RCDK::Util::Image.molfile_to_png(molfile, filename, width, height)
end
private
def cml_to_molfile(cml)
string_reader = StringReader.new(cml.toXML)
@cml_reader.setReader(string_reader)
chem_file = @cml_reader.read(@@ChemFile.new)
molecule = chem_file.getChemSequence(0).getChemModel(0).getSetOfMolecules.getMolecule(0)
molecule = RCDK::Util::XY.coordinate_molecule(molecule)
RCDK::Util::Lang.get_molfile(molecule)
end
endTesting, Testing
A short test will demonstrate the capabilities of the Depictor library. Add the following to a file called test.rb in your working directory (or enter it interactively with irb):
require 'depictor'
depictor = Depictor.new
name = '3,3-dimethyl-7-oxo-6-[(2-phenylacetyl)amino]-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylic acid' #Penicillin G
depictor.depict_png(name, 'out.png', 300, 300)Running this test produces a 300x300 PNG image of Penicillin G, named out.png, in your working directory:

As you can see, this simple library and test code has:
- correctly parsed the rather complex IUPAC name (3,3-dimethyl-7-oxo-6-[(2-phenylacetyl)amino]-4-thia-1-azabicyclo[3.2.0]heptane-2- carboxylic acid) to a valid CML representation
- converted this representation to a CDK AtomContainer
- assigned 2-D coordinates
- rendered a PNG image in color
Notice how the thiaazabicyclo[3.2.0] system, complete with properly-placed substitutents, was flawlessly identified and parsed.
If you entered the above test code interactively via IRB, you may have noticed a multi-second delay in instantiating Depictor. This latency results from a sluggish NameToStructure constructor in OPSIN. A similar delay also occurs in OPSIN's pure-Java unit tests. Once Depictor is instantiated, however, image generation occurs relatively quickly.
The unususal orientation of the beta-lactam carbonyl group is determined by CDK's StructureDiagramGenerator. The source of this behavior will be explored in a future article.
More Examples
To illustrate some of the capabilities of the OPSIN-RCDK combination, a few more examples are provided below.
One of OPSIN's more surprising features is how well it handles heterocycles. For example, the IUPAC name for caffeine (1,3,7-trimethylpurine-2,6-dione) is translated to:
As another example, consider the tetrazole (1-[2-hydroxy-3-propyl-4-[3-(2H-tetrazol-5-yl)propoxy]phenyl]ethanone):
Highly substituted benzene rings and carboxylic acids are also translated accurately, as in 3-acetamido-5-(acetyl-methyl-amino)-2,4,6-triiodo-benzoic acid (Metrizoate):
How about a hairy-looking macrocycle name with multiple levels of morpheme nesting (3,6-diamino-N-[[15-amino-11-(2-amino-3,4,5,6-tetrahydropyrimidin-4-yl)-8- [(carbamoylamino)methylidene]-2-(hydroxymethyl)-3,6,9,12,16-pentaoxo- 1,4,7,10,13-pentazacyclohexadec-5-yl]methyl]hexanamide)? Not a problem:
Limitations
In my tests of the OPSIN library, one structure appeared to be incorrectly parsed - N-(5-chloro-2-methyl-phenyl)-2-methoxy-N-(2-oxooxazolidin-3-yl)acetamide:
There are actually two problems with the output. First, an oxygen atom and a methyl group are overlapping near the top of the diargram. This cosmetic issue is related to CDK's StructureDiagramGenerator. Second, the oxazolidine nitrogen atom is misplaced by OPSIN. The correct 2-D image of this molecule, obtained from PubChem, is shown below:
Conclusions
It's not common to find an early-development Open Source project with the sophistication of OPSIN. The smooth handling of nested morphemes, aromatic heterocycles, macrocycles, and a good fraction of what I threw at it leads me to belive that a well-designed and extensible nomenclature parsing engine lies at OPSIN's core. More on that later, though.
What could you do with a powerful Open Source IUPAC nomenclature parser? The answer to that one question could fill a three-volume series. Suffice it to say that OPSIN, in combination with other Open Source software, offers virtually limitless potential for indexing, collecting, repackaging, reprocessing, and mashing up vast amounts of chemical information. Because of its Open Source license, OPSIN can be extended and otherwise modified to fit your particular needs. Future articles will highlight some of the possibilities.
Looking at InChIs
InChI identifiers can be viewed both as unique molecular keys and as a language encoding molecular structure. With the right software, it is possible to decode any InChI to arrive at a human-readable molecular structure. This tutorial will show how to convert InChI identifiers into 2-D molecular renderings using open source tools.
Prerequisites
The InChI to 2-D image conversion process requires two pieces of software:
Rino decodes InChI identifiers into molfiles. The resulting atomic coordinates are set to zero.
RCDK assigns coordinates to the molfile produced by Rino, and renders the result.
Bring on the Code
The following Ruby code illustrates how the InChI for the pesticide fipronil (Regent) can be translated into a PNG image:
require 'rubygems'
require_gem 'rino'
require_gem 'rcdk'
require 'util'
inchi = 'InChI=1/C12H4Cl2F6N4OS/c13-5-1-4(11(15,16)17)2-6(14)8(5)24-10(22)9(7(3-21)23-24)26(25)12(18,19)20/h1-2H,22H2' #fipronil
reader = Rino::InChIReader.new
molfile1 = reader.read(inchi) # lacks 2-D atomic coordinates
molfile2 = RCDK::Util::XY.coordinate_molfile(molfile1) # has 2-D atomic coordinates
RCDK::Util::Image.molfile_to_png(molfile2, 'fipronil.png', 350, 300)Running this code produces the image fipronil.png in your working directory:

Limitations
The technique illustrated here is subject to the same limitations as the underlying software. For Rino, this means that stereochemistry is ignored. For RCDK, this means that implicit hydrogen atoms, isotopes, and charges are omitted, and that layout of macrocycles and other complex ring systems may not subjectively appear very refined.
Other Software that Does This
To my knowledge, only one other Open Source package, BKChem, is capable of rendering InChIs as described here. BKChem's underlying InChI translation and depiction software, OASA, can also be accessed online. For comparison, OASA produces the following image for for the fipronil InChI:

The PubChem editor can also translate and render InChIs, but no source code appears to be available. PubChem's InChI translation and rendering output for fipronil is:

The Chemistry Development Kit, on which RCDK is based, was recently upgraded to support reading InChI identifiers. For some time, CDK has been able to generate 2-D atomic coordinates.
More information on InChI software can be found at Beda Kosata's InChI.info site.
The Final Word
Within certain limitations, it is quite feasible to programatically obtain a 2-D molecular image for any InChI identifier. Combining this capability with other chemical informatics software and services offers numerous possibilities to use InChI in innovative ways.


