Security and the Online Chemical Catalog
James Swetnam has an interesting article on security and the online chemical catalog. Apparently, the previous structure query results of a certain major retailer can be viewed by anyone, not just the person who originated them.
Chemical structure queries, like any other database query, can be very valuable information in the hands of a knowledgeable and determined competitor. Depending on the application, safeguarding structure queries can be just as important as safeguarding the structure database itself.
Simple 3D Conformer Generation with Smi23D 3
Three-dimensional conformer generation is a common problem in cheminformatics. The most convenient and generally-useful method for creating chemical structures is the 2D chemical structure editor; applications that require three-dimensional representations need a way to generate reasonable coordinates from 2D user input. Until recently, there were no options for doing so with Open Source software. This article shows how the Open Source package smi23d can be used to convert ordinary SMILES strings into three-dimensional molfile representations.
About smi23d
smi23d uses a two-stage process to generate 3D coordinates.; an initial pass with smi2sdf generates rough coordinates and subsequent refinement by mengine results in the final coordinates. The package was originally written in C by Kevin Gilbert and updated by Rajarshi Guha. As part of what appears to be a growing trend in cheminformatics, smi23d is licensed under the highly-permissive Apache License.
On a related note, the source code for a program called Frog is reportedly on its way into the Open Babel project.
Prerequisites
To build smi23d, you'll need to install Scons, a Make-like build utility written in Python. I was able to install the Scons rpm on my Linux system without a problem. smi23d uses no other dependencies.
Download smi23d
smi23d can be downloaded with Subversion:
$ svn co https://cicc-grid.svn.sourceforge.net/svnroot/cicc-grid/cicc-grid/smi23d/trunk smi23d
Building smi23d
With the source code in place, compilation is just a matter of running Scons:
$ cd smi23d $ scons ...
Once the sources are compiled, we'll want to configure our system a bit:
$ cd build $ ls mmff94.prm mmxconst.prm $ cp ../src/smi2sdf/smi2sdf . $ cp ../src/mengine/mengine .
The two files mmff94.prm and mmxconst.prm are parameter files needed by both smi2sdf and mengine.
With smi2sdf and mengine both in the build directory, we can create a simple test with the SMILES for Ivabradine:
$ vi test.smi ... $ cat test.smi CN(CCCN1CCC2=CC(=C(C=C2CC1=O)OC)OC)C[C@H]3CC4=CC(=C(C=C34)OC)OC
With everything ready to go, we can begin Stage one:
$ ./smi2sdf test.smi Found 1 structures in test.smi field : MMX Atom Types: 169 Bonds: 580 Bond3: 0 Bond4: 0 Bond5: 0 Angle: 434 Angle3: 41 Angle4: 60 Angle5: 0 Torsion: 697 Torsion4: 58 Torsion5: 0 Vdw: 172 OOP: 91 Dipole: 474 Charge: 0 Improper: 0 STBN: 26 ANGANG: 0 STRTOR: 0 VDWPR: 4 Input file = test.smi Output file = output.sdf Param file = mmxconst.prm Log file = error.log Inorganic file = test_inorg.smi Structure: 0 CN(CCCN1CCC2=CC(=C(C=C2CC1=O)OC)OC)C[C@H]3CC4=CC(=C(C=C34)OC)OC
You can view the result in an application like Jmol:

It's not much to look at, but we're not quite done yet.
Stage two is accomplished by using the output of Stage one as input to mengine:
$ ./mengine -o optimized.sdf output.sdf field : MMX Atom Types: 169 Bonds: 580 Bond3: 0 Bond4: 0 Bond5: 0 Angle: 434 Angle3: 41 Angle4: 60 Angle5: 0 Torsion: 697 Torsion4: 58 Torsion5: 0 Vdw: 172 OOP: 91 Dipole: 474 Charge: 0 Improper: 0 STBN: 26 ANGANG: 0 STRTOR: 0 VDWPR: 4 field : MMFF94 Atom Types: 181 Bonds: 448 Bond3: 0 Bond4: 0 Bond5: 0 Angle: 1801 Angle3: 21 Angle4: 61 Angle5: 0 Torsion: 674 Torsion4: 38 Torsion5: 95 Vdw: 182 OOP: 112 Dipole: 0 Charge: 0 Improper: 0 STBN: 286 ANGANG: 0 STRTOR: 0 VDWPR: 0
We now have a file called output.sdf. As you can see, it's a pretty good 3D representation of Ivabradine:

Conclusions
In this tutorial, we've seen how the Open Source program smi23d can be used to assign reasonable 3D coordinates to an arbitrary SMILES string. One very practical use of smi23d would be to process the output of 2D chemical structure editors prior to use in a 3D program. Future articles will discuss some of the possibilities.
Image Credit: Mary Mactavish
How Would Your Cheminformatics Tool Do This?

Reference: Yorke, Wan, Xia, and Zheng Tetrahedron Lett.
Run Babel Anywhere Java Runs with JBabel 6
A recent series of D-F articles have discussed the use of NestedVM to compile cheminformatics programs written in C/C++ to pure java binaries that can be run on any system with a JVM. More specifically, an attempt to compile OpenBabel's babel program to bytecode was only partially successful. With the help of Geoff Hutchison, the problem was resolved. This article introduces JBabel, a platform-independent, pure Java implementation of OpenBabel's babel program.
A Little About JBabel
JBabel was compiled from the Open Babel 2.1.1 source release and can be downloaded from SourceForge. The same jarfile was successfully tested on Linux, Windows and Mac OS X. You can verify JBabel works on your platform with the following command:
$ java -jar jbabel-20071209.jar -Hsmi smi SMILES format A linear text format which can describe the connectivity and chirality of a molecule Write Options e.g. -xt n no molecule name t molecule name only r radicals lower case eg ethyl is Cc
This version of JBabel was compiled with support for three formats:
SMILES (smi). Non-canonical SMILES.
MDL (mol). Molfiles and SD Files.
Canonical SMILES (can). Canonical SMILES implementation donated by eMolecules.
I'll discuss exactly how support for these formats was added in a subsequent post. More formats will be added in the future. For now, let's just try JBabel out.
Testing JBabel
One way to use JBabel is interactively from the command line - just leave out an input or output file parameter. For example, if you wanted to get the eMolecules canonical SMILES for sertraline, you might do something like this (be sure to use two returns to begin processing):
$ java -jar jbabel-20071209.jar -ismi -ocan CN[C@H]1CC[C@H](C2=CC=CC=C12)C3=CC(=C(C=C3)Cl)Cl CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12 1 molecule converted 34 audit log messages
This canonical SMILES can be converted into a molfile with the following:
$ java -jar jbabel-20071209.jar -ismi -omol
CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12
OpenBabel12090723182D
22 24 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0
...
To convert using input and output files, we could use a medium-sized dataset such as the PubChem benzodiazepine dataset prepared for Rubidium:
$ java -jar jbabel-20071209.jar -imol pubchem_benzodiazepine_20071110.sdf -ocan pubchem_benzodiazepine_20071110.smi ============================== *** Open Babel Warning in ReadMolecule WARNING: Problems reading a MDL file Cannot read title line 2117 molecules converted
This test, which parses 2117 records, required four minutes forty-five seconds on my system. For comparison, the natively compiled binary did the same thing in about thirteen seconds. Clearly, the JBabel performance hit is substantial.
Uses
Although it's very unlikely that JBabel will ever be useful in performance-critical situations, its portability makes it attractive for other uses. Examples include:
application development in heterogeneous computing environments;
use on systems in which native compilation may be difficult, such as those with unusual configurations or operating systems;
cases in which native binaries work poorly or not at all, such as in applets and Java applications;
situations in which performance is a minor consideration, such as in end-user applications that process only a few molecules at a time, or during application prototyping
Conclusions
This article has described JBabel, the first portable binary version of OpenBabel's babel molecular file format interconversion program. The next article in this series will describe in detail the steps that were used to compile it.
Casual Saturdays: Daybreak

Image Credit: Troy Mason

