Humanizing Line Notations

Posted by Rich Apodaca Sat, 02 Sep 2006 21:08:00 GMT

Line notations are useful for encoding molecular structure with computers, especially in a network environment. Because line notations are compact and ASCII-based, they can, among other purposes, be used to query popular Web search engines for chemical content on the web. Useful as line notations are for computers, they are not as useful to humans, who would much rather have a 2-D structure diagram to look at.

Depict is an example of software that generates 2-D structure renderings from a SMILES string. Behind the scenes, the software parses the SMILES string, creates a connection table, determines 2-D coordinates for its atoms, and produces a raster image of the result. Software accomplishing the same task is also available from OpenEye. In this tutorial, you'll see one way to create free Depict-like functionality from Open Source tools.

The Ingredients

This tutorial uses Arton's Ruby Java Bridge, the installation and use of which has been outlined previously. In addition, you'll need to download Structure-CDK v0.1.2, also previously discussed. Be sure to download v0.1.2, as two upgrades have been released since the package was originally discussed.This tutorial has been tested on Mandriva Linux 2006.

Create a working directory called depict. From the lib directory of the Structure-CDK distribution, copy cdk-20060714.jar and structure-cdk-0.1.2.jar into your depict working directory.

The Code

Now create a file called depict.rb and copy the following code into it:

ENV['CLASSPATH'] = './cdk-20060714.jar:./structure-cdk-0.1.2.jar'

require 'rubygems'
require_gem 'rjb'
require 'rjb'

SmilesParser = Rjb::import 'org.openscience.cdk.smiles.SmilesParser'
StructureDiagramGenerator = Rjb::import 'org.openscience.cdk.layout.StructureDiagramGenerator'
ImageKit = Rjb::import 'net.sf.structure.cdk.util.ImageKit'

class Depictor

  def initialize
    @smiles_parser = SmilesParser.new
    @sdg = StructureDiagramGenerator.new
  end

  def depict_png(smiles, width, height, path_to_png)
    ImageKit::writePNG(smi_to_mol(smiles), width, height, path_to_png)
  end

  def depict_svg(smiles, width, height, path_to_svg)
    ImageKit::writeSVG(smi_to_mol(smiles), width, height, path_to_svg)
  end

  def smi_to_mol(smiles)
    @sdg.setMolecule(@smiles_parser.parseSmiles(smiles))
    @sdg.generateCoordinates

    @sdg.getMolecule
  end
end 
After you save this file, you'll need to set your LD_LIBRARY_PATH on unix (or the equivalent on another OS):
export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/i386:$LD_LIBRARY_PATH

This tells RJB where to find Java's native libraries. Because of RJB's current design, LD_LIBRARY_PATH needs to be set from the command line, rather than from within a Ruby process.

Using the Depictor class is simple. For example, to generate SVG and PNG images of desloratadine (Clarinex):

require 'depict'

depictor = Depictor.new

depictor.depict_svg('Clc4ccc3C(=C1CCNCC1)c2ncccc2CCc3c4', 300, 300, 'desloratadine.svg')
depictor.depict_png('Clc4ccc3C(=C1CCNCC1)c2ncccc2CCc3c4', 300, 300, 'desloratadine.png')

The Output

Running the above code, either with the Ruby interpreter (ruby) or with Interactive Ruby (irb) will produce an SVG and a PNG image in your depict directory containing the 2-D structure of the popular antihistamine (see image below). Scalable Vector Graphics (SVG) format is a popular, XML-based vector graphics encoding system that can be viewed with the Firefox browser and several other software packages.

The code we've used here takes advantage of convenience methods in the Structure-CDK library. However, it is possible to customize the output in several ways, including line thickness, line spacing, color scheme, and atom label height by using the library's lower-level API.

Being able to render a human-readable structure diagram from a line notation is useful in many situations. As you can see, this complex process can be accomplished quickly using Ruby, Java and open source chemical informatics libraries. Future articles will make use of this capability in building more complex chemical informatics systems.

A First Look at Modular Chemical Descriptor Language (MCDL)

Posted by Rich Apodaca Sun, 20 Aug 2006 00:04:00 GMT

The Modular Chemical Descriptor Language (MCDL) was developed to address the need for linear representation of structural and other chemical information for chemical databases, E-journals and the Internet.

-Andrei A. Gakh and Michael N. Burnett J. Chem. Inf. Comput. Sci. 2001, 41, 1494-1499

Molecular line notations reduce a molecular structure to a string of ASCII characters. This is helpful in a variety of situations: as a method of text-based structure input; as a compact representation that can be stored and transmitted over a network; and in some cases as a method for uniquely identifying a molecular structure. The development of line notations is one of the oldest pursuits in chemical informatics.

MCDL has a lot in common with InChI. Both languages are modular in the sense that succeeding levels of structural complexity are represented by individual “modules” (MCDL) or “layers” (InChI): constitution; connectivity; and stereochemistry. Both languages sport free developer toolkits written in C (the InChI toolkit, and LINDES, respectively). Interactive structure-drawing tools even exist for both languages (the interactive MCDL tool was recently released).

MCDL and InChI also differ in some significant ways. One of the biggest differences is that InChI separates hydrogen atoms and their parent atoms into separate layers, whereas MCDL places hydrogen atoms together with the atom to which they are attached. Another difference is in the approach to canonicalization. InChI uses a relatively complex system not unlike that of canonical SMILES. In contrast, MCDL uses a simpler system based on ASCII lexical ordering of atom types. On the non-technical side, InChI carries the endorsement of IUPAC, whereas MCDL is the work of independent developers.

MCDL and InChI approach the problem of developing an internet-ready line notation from different angles. It will be interesting to see how each evolves.

107 Years of Line-Formula Notations (1861-1968)

Posted by Rich Apodaca Fri, 18 Aug 2006 06:50:00 GMT

L E5 B666 FVTJ A1 E1 OQ

Thus, within the short period of just seven years after the birth of structural chemistry in 1861, virtually all of the main ideas relating to line-formula descriptions were conceived and published. No basically new practices appeared for some 79 years. Then, within an identically brief period of just seven years (1947-1954), virtually all of the fundamental features of structure-delineating chemical notations appeared in the international chemical literature.

-William J. Wiswesser J. Chem. Doc. 1968, 8, 146-150

Apparently, advances in chemical line notations have a history of occurring in clusters. Perhaps the development of InChI will spawn a renaissance in the development and use of line notations. Is there room (or need) for multiple line notation languages, each filling a particular niche, or can a universal line notation ever be developed? Will currently popular line notations such as SMILES and InChI seem as cumbersome in 30 years as Wiswesser Line Notation does today?

Older posts: 1 2