Visualizing IUPAC Names with ChemNomParse

Posted by Rich Apodaca Mon, 11 Sep 2006 14:29:00 GMT

Nomenclature translation is the process of converting a human-readable chemical name into a machine-readable notational scheme such as a connection table. It plays a key role in linking the older chemical literature to modern information technologies, such as the Internet.

Buried deep within the Chemistry Development Kit (CDK) is a library for nomenclature translation called ChemNomParse. At the heart of ChemNomParse is a remarkable piece of software called the Java Compiler Compiler (JavaCC), a parser generator and lexical analyzer generator for Java. A FAQ on JavaCC is available here.

This tutorial demonstrates how freely-available, open source tools can be used to parse an IUPAC chemical name and generate its corresponding 2-D structure rendering. A closely-related tutorial on generating 2-D structures from SMILES strings may be helpful as background.

Ingredients

This tutorial uses Arton's Ruby Java Bridge, the installation and use of which has been outlined previously. In addition, you'll need to download Structure-CDK v0.1.2, also previously discussed. Be sure to download v0.1.2, as two upgrades have been released since the package was originally described. This tutorial has been tested on Mandriva Linux 2006.

Create a working directory called nom. From the lib directory of the Structure-CDK distribution, copy cdk-20060714.jar and structure-cdk-0.1.2.jar into your depict working directory.

Code

Create a file called depict.rb and copy the following code into it:

ENV['CLASSPATH'] = './cdk-20060714.jar:./structure-cdk-0.1.2.jar'

require 'rubygems'
require_gem 'rjb'
require 'rjb'

NomParser = Rjb::import 'org.openscience.cdk.iupac.parser.NomParser'
StructureDiagramGenerator = Rjb::import 'org.openscience.cdk.layout.StructureDiagramGenerator'
ImageKit = Rjb::import 'net.sf.structure.cdk.util.ImageKit'

class Depictor

  def initialize
    @sdg = StructureDiagramGenerator.new
  end

  def depict_png(nom, width, height, path_to_png)
    ImageKit::writePNG(nom_to_mol(nom), width, height, path_to_png)
  end

  def depict_svg(nom, width, height, path_to_svg)
    ImageKit::writeSVG(nom_to_mol(nom), width, height, path_to_svg)
  end

private

  def nom_to_mol(nom)
    @sdg.setMolecule(NomParser::generate(nom))
    @sdg.generateCoordinates

    @sdg.getMolecule
  end
end
After you save this file, you'll need to set your LD_LIBRARY_PATH on unix (or the equivalent on another OS):
$ export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/i386:$LD_LIBRARY_PATH

This tells RJB where to find Java's native libraries. Because of RJB's current design, LD_LIBRARY_PATH needs to be set from the command line, rather than from within a Ruby process.

Using the Depictor class is as simple as creating an instance and invoking depict_png or depict_svg on it:

require 'nom'

depictor = Depictor.new

depictor.depict_png('2-phenylcyclohexan-1-ol', 300, 300, 'output.png')

Executing the above code either through the Ruby interpreter (ruby) or via Interactive Ruby (irb) products a PNG image of the chiral auxiliary shown below:

Other names correctly recognized by ChemNomParse include:

  • phenylhexyne
  • 2-chloro-3-phenyl-4,4-dimethylhexane
  • 3-phenyl-1-aminopropane
  • 1,2-difluoro-3-hydroxycyclohexene

Limitations

Many chemical names, ranging from the simple to the complicated, were not be recognized at all by ChemNomParse. Some examples are:

  • benzene
  • piperidine
  • 1-methoxyhexane
  • 2-methyl-5-prop-1-en-2-yl-cyclohex-2-en-1-one (carvone)

Some names were incorrectly interpreted due to misassigned locants. For example, 2-chloro-3-hydroxybutanoic acid produced the incorrectly asssigned structure shown below:

ChemNomParse can accurately recognize chemical names representing simple substitutions on basic hydrocarbon scaffolds. More complicated structures, such as heterocycles, bicyclic systems, and systems involving nested substituents do not appear to be handled at all. It is not clear to what extent these limitations reflect a small dictionary of morphemes (the basic nomenclature building blocks) versus deeper design issues.

Despite its limitations, ChemNomParse is an interesting piece of open source software for working with chemical nomenclature. From this simple tutorial, it can be seen that nomenclature translation, when combined with other capabilities such as 2-D rendering, offers many exciting possibilities.

Generating and Serving 2-D Molecular SVGs

Posted by Rich Apodaca Sat, 09 Sep 2006 14:45:00 GMT

A previous article showed some examples of 2-D molecular rendering using Scalable Vector Graphics (SVG) embedded in a web page. This article will outline some simple steps for generating these images and publishing them on the Web.

Prerequisites

This tutorial uses Structure-CDK, a CDK add-on library written in Java. You'll need to install Sun's JDK 1.4.2 or later (or an open source alternative). Although not required, Ant makes it easy to use Structure-CDK. You'll want to make sure that your browser is SVG-enabled.

Creating a 2-D Molecular SVG File

Methylenedioxymethamphetamine (MDMA)

An SVG image like the one shown above can be created with this sequence of steps:

  1. Download and unzip the current release of Structure-CDK.
  2. Move into the unzipped Structure-CDK directory and run the Structure Visual Testing Framework:
    $ cd structure-cdk-0.1.2
    $ ant vis
    
  3. From the File menu, choose Open... and use the file dialog to open a molfile. The molfiles directory contains some samples.
  4. Resize the image to taste and choose Save as SVG... from the File menu. This writes the SVG image to a directory and filename of your choice.

Viewing the SVG File

You now have several options for viewing the SVG file. One of the simplest is to open it with the Firefox browser. Another option is to open it with the excellent, free SVG editor Inkscape. From Inkscape, you can edit your image, apply any number of special effects from the mundane to the remarkable, and save the result to disk.

Deploying the SVG File on the Web

After uploading your SVG file to a blog or other site, you may have some additional configuration to do. Because the SVG MIME type is not configured by default on all servers, you may need to do so yourself.

After uploading my first set of SVG files to my server, I tried to view them in Firefox. Instead of seeing the expected image in the browser window, I got a dialog asking if I wanted to open it with Inkscape or save it to disk.

With the help of some documentation, I was able to track the problem down to my server, which was using the MIME type "image/svg-xml" instead of "image/svg+xml". The former is the obsolete SVG MIME type, which Firefox rejects. Internet Explorer equipped with Adobe's SVG plugin, on the other hand, accepts the obsolete MIME type, rendering SVG without presenting a dialog. Web-Sniffer, which decodes header information from HTTP responses, may be useful for debugging your server's MIME type configuration.

Having configured your server's SVG MIME type as "image/svg+xml", pointing your browser to your SVG file's URL will let you view it in its full, W3C-compliant glory.

Embedding the SVG File in HTML

There are a few options for embedding an SVG image in HTML. The most universally-applicable mechanism is the <embed> tag:

<html>
  <head></head>
  <body>

  <!-- document body -->

  <embed src="url-to-svg-file.svg" TYPE="image/svg+xml" width="400" height="400" />

  </body>
</head>

Embedding SVG into HTML carries some limitations. For example, you can't interact with the SVG DOM the way you can if the SVG is inlined, or placed directly into the HTML document itself. But that's a subject for another time.

Creating and deploying 2-D molecular images as SVG documents is a straightforward process, provided that some details are taken care of. Future articles in this series will show how SVG's advanced features make it a compelling choice as a chemical informatics rendering platform.

Note: if your're viewing this article in a feed aggregator, the SVG images may have been stripped out. If so, please see the original article.

Rendering Molecules with SVG on the Web

Posted by Rich Apodaca Thu, 07 Sep 2006 13:56:00 GMT

Scalable Vector Graphics (SVG) is an XML-based language for encoding graphical content. Unlike raster image formats such as PNG, JPEG, and GIF, SVG images can be scaled to any resolution without pixelation. Given that 2-D structure diagrams are essentially line drawings, SVG seems like a natural fit for this type of representation. SVG also boasts several advanced features that make it an especially intriguing choice.

In this first article of a series on SVG and chemical informatics, I'll start by showing some embedded SVG images of molecules (also see this demo). Later articles will discuss technical details such as writing, reading, animating, scripting, editing, annotating, and distributing 2-D structures encoded as SVG.

Now, the Bad News

SVG is a work in progress. Browser support and performance can vary widely. If you are using Firefox version 1.5 or better, you should be able to see all of the images in this article without doing anything.

Unfortunately for Internet Explorer users, Microsoft's browser lacks built-in SVG support. Still worse, IE 7 appears ready to continue this perplexing tradition. Fortunately, Adobe offers an SVG plugin for IE.

Although this page was tested in both IE 6 (with Adobe's plugin) and Firefox 1.5 (both Windows and Linux), your particular configuration may vary. Please feel free to post your experiences.

Some Simple SVG Structures

The examples below illustrate how SVG images of molecules can be embedded in an HTML document. These images just scratch the surface of what is possible. If you don't see 2-D structures, your browser may lack SVG support.

Ascorbic Acid

Alprazolam

Furosemide

DDT

Note: if your're viewing this article in a feed aggregator, the SVG images may have been stripped out. If so, please see the original article.

Humanizing Line Notations

Posted by Rich Apodaca Sat, 02 Sep 2006 17:08:00 GMT

Line notations are useful for encoding molecular structure with computers, especially in a network environment. Because line notations are compact and ASCII-based, they can, among other purposes, be used to query popular Web search engines for chemical content on the web. Useful as line notations are for computers, they are not as useful to humans, who would much rather have a 2-D structure diagram to look at.

Depict is an example of software that generates 2-D structure renderings from a SMILES string. Behind the scenes, the software parses the SMILES string, creates a connection table, determines 2-D coordinates for its atoms, and produces a raster image of the result. Software accomplishing the same task is also available from OpenEye. In this tutorial, you'll see one way to create free Depict-like functionality from Open Source tools.

The Ingredients

This tutorial uses Arton's Ruby Java Bridge, the installation and use of which has been outlined previously. In addition, you'll need to download Structure-CDK v0.1.2, also previously discussed. Be sure to download v0.1.2, as two upgrades have been released since the package was originally discussed.This tutorial has been tested on Mandriva Linux 2006.

Create a working directory called depict. From the lib directory of the Structure-CDK distribution, copy cdk-20060714.jar and structure-cdk-0.1.2.jar into your depict working directory.

The Code

Now create a file called depict.rb and copy the following code into it:

ENV['CLASSPATH'] = './cdk-20060714.jar:./structure-cdk-0.1.2.jar'

require 'rubygems'
require_gem 'rjb'
require 'rjb'

SmilesParser = Rjb::import 'org.openscience.cdk.smiles.SmilesParser'
StructureDiagramGenerator = Rjb::import 'org.openscience.cdk.layout.StructureDiagramGenerator'
ImageKit = Rjb::import 'net.sf.structure.cdk.util.ImageKit'

class Depictor

  def initialize
    @smiles_parser = SmilesParser.new
    @sdg = StructureDiagramGenerator.new
  end

  def depict_png(smiles, width, height, path_to_png)
    ImageKit::writePNG(smi_to_mol(smiles), width, height, path_to_png)
  end

  def depict_svg(smiles, width, height, path_to_svg)
    ImageKit::writeSVG(smi_to_mol(smiles), width, height, path_to_svg)
  end

  def smi_to_mol(smiles)
    @sdg.setMolecule(@smiles_parser.parseSmiles(smiles))
    @sdg.generateCoordinates

    @sdg.getMolecule
  end
end 
After you save this file, you'll need to set your LD_LIBRARY_PATH on unix (or the equivalent on another OS):
export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/i386:$LD_LIBRARY_PATH

This tells RJB where to find Java's native libraries. Because of RJB's current design, LD_LIBRARY_PATH needs to be set from the command line, rather than from within a Ruby process.

Using the Depictor class is simple. For example, to generate SVG and PNG images of desloratadine (Clarinex):

require 'depict'

depictor = Depictor.new

depictor.depict_svg('Clc4ccc3C(=C1CCNCC1)c2ncccc2CCc3c4', 300, 300, 'desloratadine.svg')
depictor.depict_png('Clc4ccc3C(=C1CCNCC1)c2ncccc2CCc3c4', 300, 300, 'desloratadine.png')

The Output

Running the above code, either with the Ruby interpreter (ruby) or with Interactive Ruby (irb) will produce an SVG and a PNG image in your depict directory containing the 2-D structure of the popular antihistamine (see image below). Scalable Vector Graphics (SVG) format is a popular, XML-based vector graphics encoding system that can be viewed with the Firefox browser and several other software packages.

The code we've used here takes advantage of convenience methods in the Structure-CDK library. However, it is possible to customize the output in several ways, including line thickness, line spacing, color scheme, and atom label height by using the library's lower-level API.

Being able to render a human-readable structure diagram from a line notation is useful in many situations. As you can see, this complex process can be accomplished quickly using Ruby, Java and open source chemical informatics libraries. Future articles will make use of this capability in building more complex chemical informatics systems.

Drawing 2-D Structures with Structure-CDK

Posted by Rich Apodaca Mon, 28 Aug 2006 14:03:00 GMT

Rendering 2-D molecular structures is a fundamental part of chemical informatics. It's used in building end user systems, and more immediately, it can be critical for creating and debugging developer tools.

The Chemistry Development Kit (CDK) is a highly-functional chemical informatics library written in Java. Although it provides built-in 2-D rendering capabilities through the org.openscience.cdk.renderer package, I wanted something a little easier for me to customize. The result is Structure-CDK, a 37K add-on library for the CDK. This article discusses the main features of Structure-CDK with some screenshots and code.

To begin using Structure-CDK, download the current release. This package contains a complete copy of the most recent CDK release, so there is nothing else to install or download. Structure-CDK was developed with JDK-1.5.0. Because it contains no 1.5-specific features, it may work on earlier Java versions. Ant is useful, but not essential.

The packages contains an interactive viewing application, which can be invoked with the "vis" Ant task:

$ ant vis

Two types of molecules can be viewed. The first consists of those defined in org.openscience.cdk.templates.MoleculeFactory, which can be found under the Structure menu. 2-D coordinates are provided by CDK's StructureDiagramGenerator. Additionally, molecules can be opened as molfiles (File->Open), several samples of which are contained in the distribution's molfiles directory. Let's take a look at oseltamivir (Tamiflu).

This view can be changed in a couple of ways. Resizing the window automatically resizes and centers the image, while maintaining proportionality of all measurements. This feature, when used with antialiasing, results in the image staying readable regardless of its size. Additionally, Edit->Preferences produces a dialog for changing the rendering settings.

Now let's see some code that will read a molecule from a molfile and write a 2-D PNG image to disk. This can be done via the static convenience methods found in ImageKit:

import java.io.FileReader;

import org.openscience.cdk.io.MDLReader;
import org.openscience.cdk.interfaces.IMolecule;
import org.openscience.cdk.Molecule;

import net.sf.structure.cdk.util.ImageKit;
...

public void writePNG(String pathToMolfile, String pathToPNG) throws Exception
{
  MDLReader mdlReader = new MDLReader(new FileReader(pathToMolfile));
  IMolecule mol = (IMolecule) mdlReader.read(new Molecule());

  ImageKit.writePNG(mol, 300, 300, pathToPNG);
}

The above code fragment creates a 300x300 PNG image from the contents of the molfile specified by pathToMolfile.

Although several rendering features, both aesthetic and functional, are supported, some are missing. Most importantly, atom labels are only rendered without hydrogen atoms and there is no stereochemistry support. Performance has not been optimized at all. Future versions of Structure-CDK will be aimed at addressing these issues.

Given the central nature of 2-D structure rendering, it's nice to have options. Structure-CDK provides a convenient, interactive solution. Future articles will discuss the integration of Structure-CDK into more complex chemical informatics systems.

Older posts: 1 2 3