Extending InChI Stereochemistry 6
As covered by Reuters and many other wire services, ArtusLabs and Boston University's CMLD have teamed up to extend InChI's stereochemistry support:
DURHAM, N.C.--(Business Wire)-- ArtusLabs, Inc., a leading provider of life science software tools and data management solutions, has entered into a partnership with Boston University's Center for Chemical Methodology and Library Development (CMLD) to develop a way to standardize and expand the way in which stereochemistry, and ultimately a three-dimensional structures, are represented in the International Chemical Identifier (InChI(TM)).
With the increasing use of molecules containing axial chirality , planar chirality and other forms of non-tetrahedral stereogenicity in chemistry, the move by ArtusLabs and CMLD could be significant.
Put simply, the ability of cheminformatics to represent certain kinds of compounds has fallen way behind the ability of chemistry to make them. While molecules once considered mere oddities 30 years ago continue to pour into corporate compound collections, laboratory notebooks, and product catalogs, cheminformatics has been stuck with a form of molecular representation that hasn't changed significantly in several decades.
InChI isn't alone. All three of the most widely-used molecular representation systems now in use (Molfile, SMILES, and CML) suffer from fundamental limitations in representing axial chirality, planar chirality, and multicenter bonding.
The kind of work being undertaken by ArtusLabs and CMLD is essential if cheminformatics is to continue to keep pace with new developments in chemistry.
A Simple and Portable Ruby Interface to InChI - Part 2: Silencing Console Output
The previous article in this series described a simple and portable method for interfacing Ruby to the cInChI-1 binary. One disadvantage was noisy console output. This article offers a minor modification to disable it.
The Code
module InChI
def inchi_for molfile
output = %x[echo "#{molfile}" | cInChI-1 -STDIO 2>/dev/null]
output.eql?("") ? "" : output.split(/\n/)[1]
end
endHere, we're taking advantage of the ability to redirect certain output streams to /dev/null.
Testing the Code
Saving the above in a file called inchi.rb, we can test it from IRB. To make things interesting, let's pull a molfile from Chempedia:
$ irb irb(main):001:0> require 'open-uri' => true irb(main):002:0> require 'inchi' => true irb(main):003:0> include InChI => Object irb(main):004:0> open 'http://chempedia.com/compounds/83490.mol' do |f| irb(main):005:1* puts inchi_for(f.read) irb(main):006:1> end InChI=1/C15H15NO3S/c17-14(16-18)11-20(19)15(12-7-3-1-4-8-12)13-9-5-2-6-10-13/h1-10,15,18H,11H2,(H,16,17) => nil
We should be able to run this code unmodified on any UNIX-like system in which the cInChI-1 binary is on the path. And of course we could take this one step further by allowing command line options to be passed in as parameters to the inchi_for method.
Simplicity has its advantages.
A Simple and Portable Ruby Interface to InChI 8
Although the InChI software itself is written in C, it can still be used via Ruby. Rino offers one implementation of a Ruby InChI interface that makes use of a C extension. This article describes a more concise and portable solution.
The Code
The following code will accept a String encoding a molfile and return either its InChI, or an empty String if no InChI could be found:
module InChI
def inchi_for molfile
output = %x[echo "#{molfile}" | cInChI-1 -STDIO]
output.eql?("") ? "" : output.split(/\n/)[1]
end
endThis code takes advantage of Ruby's built-in support for Command Expansion.
Testing the Code
The code below tests the library:
require 'inchi'
include InChI
molfile =
"http://chempedia.com/compounds/106.mol
-OEChem-03010811072D
12 12 0 0 0 0 0 0 0999 V2000
2.8660 1.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0000 0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0000 -0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 -0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 -1.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 1.6200 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1.4631 0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
4.2690 0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1.4631 -0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
4.2690 -0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 -1.6200 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
1 3 1 0 0 0 0
1 7 1 0 0 0 0
2 4 1 0 0 0 0
2 8 1 0 0 0 0
3 5 2 0 0 0 0
3 9 1 0 0 0 0
4 6 2 0 0 0 0
4 10 1 0 0 0 0
5 6 1 0 0 0 0
5 11 1 0 0 0 0
6 12 1 0 0 0 0
M END"
puts "Found InChI: #{inchi_for(molfile)}"We can run the test by saving it in a file called test.rb and executing it:
$ ruby test.rb InChI version 1, Software version 1.02-beta August 2007 Log file not specified. Using standard error output. Input file not specified. Using standard input. Output file not specified. Using standard output. Options: Mobile H Perception ON Isotopic ON, Absolute Stereo ON Omit undefined/unknown stereogenic centers and bonds Full Aux. info Input format: MOLfile Output format: Plain text Timeout per structure: 60.000 sec; Up to 1024 atoms per structure End of file detected after structure #1. Finished processing 1 structure: 0 errors, processing time 0:00:00.00 Found InChI: InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H
Prerequisites
The above approach only requires that it be run on a UNIX-like system, and that a copy of the InChI library be present on your path.
Advantages
The approach described here offers some important advantages over Rino:
It works without modification on both the Matz Ruby Interpreter (C-Ruby) and JRuby.
It neither creates nor uses files.
Disadvantages
This approach creates a lot of noisy log output to the console. There must be a way to suppress it, but so far I haven't found out how.
Conclusions
Using Ruby's support for Command Expansions has enabled the creation of a concise and portable Ruby interface to the InChI toolkit. Similar principles would apply to any Unix command-line binary, including for example, Open Babel.
From C Source Code to Platform-Independent Executable Jarfile: Using NestedVM to Build JInChI
A recent series of articles discussed in some detail the process of compiling source code written in C and C++ to pure Java bytecode with NestedVM. But the full conversion process, starting with source and finishing with an executable jarfile, has to my knowledge never been documented. This article uses the InChI toolkit to illustrate the complete process for converting a real-world C source distribution into a platform-independent, executable jarfile that can be run with any modern Java Virtual Machine (JVM).
About InChI
The previous article in this series introduced JInChI, the first and only pure Java implementation of the IUPAC/NIST InChI toolkit. This toolkit is used to convert molecular connection tables encoded in MDL's SD File format into ASCII character strings called 'InChIs' that have a variety of applications in the field of cheminformatics. Although an excellent JNI-InChI interface is available, JNI won't be a viable option in every situation. Our pure Java implementation nicely complements the JNI-InChI library.
In this tutorial, we'll build version 1.0.2b of the InChI toolkit. This version, among other features, supports the generation of InChI Keys.
Prerequisites
This article assumes you've already installed NestedVM on your system. Building NestedVM required the installation of many dependencies and was a fairly lengthy, but straightforward, process on my Linux system.
Step 1: Prepare Your Environment
Before building anything, we'll need to set up our environment. NestedVM makes this simple:
$ cd /your/path/to/nestedvm/ $ source env.sh
Next, let's create a directory to hold the various components we'll need during the build process:
$ cd /your/projects/directory $ mkdir jinchi $ cd jinchi
Next, we'll download and unpack the InChI source distribution:
$ wget http://www.iupac.org/inchi/download/inchi102b.zip $ unzip inchi102b.zip
Step 2: Cross-Compile InChI
We now have everything we need to begin cross-compiling. NestedVM uses a two-part process in which source code is first cross-compiled to a MIPS binary. That MIPS binary is then translated to Java bytecode. We start by invoking make with the appropriate cross-compiler flags (which I found by looking through the InChI Makefile):
$ make C_COMPILER=mips-unknown-elf-gcc LINKER=mips-unknown-elf-gcc
This creates a MIPS binary (cInChI-1). Unless you're running on a MIPS machine, this binary won't be executable.
$ ./cInChI-1 bash: ./cInChI-1: cannot execute binary file
We can now translate the MIPS binary into pure Java bytecode:
$ java org.ibex.nestedvm.Compiler -outfile JInChI.class JInChI cInChI-1
This produces a Java class file:
$ ll JInChI.class -rw-r--r-- 1 rich rich 4372362 Nov 30 08:27 JInChI.class
We can verify that the classfile has been compiled correctly by running it:
$ java JInChI InChI ver 1, Software version 1.02-beta August 2007. Usage: cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]] Options: SNon Exclude stereo (Default: Include Absolute stereo) SRel Relative stereo -- truncated --
We have now done something truly remarkable: we've taken a standard C source code distribution and converted it into an executable Java class file. It runs, but only because the NestedVM runtime is on our classpath (thanks to the source command we used at the beginning of the process).
What we really want is a self-contained, executable jarfile that can be run, unmodified, on any system with Java installed.
Step 3: Build the JInChI Jarfile
We begin by moving up the the root directory of our jinchi project, creating a new directory to hold our java-specific files (the JInChI.class file and the NestedVM runtime), and copying them into it:
$ cd ../../.. $ mkdir jinchi-1.0.2b.1 $ mv InChI-1-software-1-02-beta/cInChI/gcc_makefile/JInChI.class jinchi-1.0.2b.1/ $ cp -r /your/path/to/nestedvm/build/org/ jinchi-1.0.2b.1
An executable jarfile generally needs a manifest to point to the main execution class. One way to do that is to first create a manifest:
$ vi jinchi-1.0.2b.1/MANIFEST.MF
It's essential that this file end with a newline.
$ cat jinchi-1.0.2b.1/MANIFEST.MF Main-Class: JInChI
With everything in place, we can create the jarfile:
$ cd jinchi-1.0.2b.1/ $ ls JInChI.class MANIFEST.MF org/ $ jar -cfm jinchi-1.0.2b.1.jar MANIFEST.MF * $ ls jinchi-1.0.2b.1.jar JInChI.class MANIFEST.MF org/
We've successfully converted standard C source code into a platform independent executable jarfile. But does it work?
Step 4: Test JInChI
We can confirm that the process has worked by running the jarfile (you should do this in a new shell session to verify that the jarfile is indeed independent of your NestedVM installation).
$ java -jar jinchi-1.0.2b.1.jar InChI ver 1, Software version 1.02-beta August 2007. Usage: cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]] Options: SNon Exclude stereo (Default: Include Absolute stereo) SRel Relative stereo
That's all there is to it! Your shiny new jarfile can be run on any system with a JVM installed. The one created here has been successfully tested on Mac OS X, Linux, and Windows.
If you'd prefer to download the JInChI jarfile, it can be obtained from SourceForge.
Conclusions
This article has illustrated in detail the process of converting a standard C source distribution into a platform-independent executable jarfile. Given the appropriate MIPS cross-compiler (many of which come with the NestedVM distribution), the same process can be repeated with code written in a variety of other languages.
You may be wondering what kind of performance hit you can expect with the approach outlined here. After all, we'd be comparing a native binary to something running on top of two abstraction layers: the NestedVM runtime and a JVM. It's not as bad as you might think, but that's a story for another time.
Image Credit: smithco
JInChI: Run InChI Anywhere Java Runs 5
Regardless of your views on Java the Programming Language, Java the Platform has a lot going for it. The ability to run the same executable on any system with a Java Virtual Machine (JVM), without recompilation, is a significant advantage in today's heterogeneous computing environment. Combine that with Java the Platform's battle-tested security model, stability and performance, and you have some compelling reasons to actually prefer that code execute on a JVM rather than bare metal.
Cheminformatics has many useful libraries, legacy and otherwise, that don't yet run on a JVM. Many of these can trace their roots back to the 1960s and 1970s and FORTRAN; others were written in C or C++ more recently. What they all have in common is that they're compiled to native binaries rather than Java bytecode.
Wouldn't it be great if this software could be easily compiled to Java bytecode instead?
A recent Depth-First article described how the InChI toolkit, an open source C library distributed by IUPAC, was successfully compiled to a Java classfile with the remarkable NestedVM library. This article describes the creation and use of a new platform-independent jarfile that runs the InChI program.
The procedure was not difficult. The two files previously released ( JInChI.class and nestedvm.jar) were combined into a single executable jarfile with a Manifest pointing to the JInChI classfile as the Main class.
The full cInChI jarfile can be downloaded here.
The jinchi.jar file can be tested from the command line:
$ java -jar jinchi.jar InChI ver 1, Software version 1.01 release 07/21/2006. Usage: cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]] Options: SNon Exclude stereo (Default: Include Absolute stereo) SRel Relative stereo SRac Racemic stereo [truncated]
If we wanted to process a molfile representing toluene, we'd use something like the following:
$ java -jar jinchi.jar test/toluene.mol InChI version 1, Software version 1.01 release 07/21/2006 Opened log file 'test/toluene.mol.log' Opened input file 'test/toluene.mol' Opened output file 'test/toluene.mol.txt' Opened problem file 'test/toluene.mol.prb' Options: Mobile H Perception ON Isotopic ON, Absolute Stereo ON Omit undefined/unknown stereogenic centers and bonds Full Aux. info Input format: MOLfile Output format: Plain text Timeout per structure: 60.000 sec; Up to 1024 atoms per structure End of file detected after structure #1. Finished processing 1 structure: 0 errors, processing time 0:00:00.00
This command would produce the following output file, just like the cInChI program:
$ cat test/toluene.mol.txt * Input_File: "test/toluene.mol" Structure: 1 InChI=1/C7H8/c1-7-5-3-2-4-6-7/h2-6H,1H3 AuxInfo=1/0/N:1,2,3,4,5,6,7/E:(3,4)(5,6)/rA:7nCCCCCCC/rB:;d2;s2;s3;d4;s1d5s6;/rC:3.6373,2.8,0;0,.7,0;0,2.1,0;1.2124,0,0;1.2124,2.8,0;2.4249,.7,0;2.4249,2.1,0;
We can also convert InChIs into molfiles (command line options work the same as in cInChI):
$ java -jar jinchi.jar test/toluene.mol.txt -OutputSDF InChI version 1, Software version 1.01 release 07/21/2006 Opened log file 'test/toluene.mol.txt.log' Opened input file 'test/toluene.mol.txt' Opened output file 'test/toluene.mol.txt.txt' Opened problem file 'test/toluene.mol.txt.prb' Options: Output SDfile only End of file detected after structure #1. Finished processing 1 structure: 0 errors, processing time 0:00:00.00
In this case the output is:
$ cat test/toluene.mol.txt.txt
Structure #1
InChI v1 SDfile Output
7 7 0 0 0 0 0 0 0 0 1 V2000
3.6373 2.8000 0.0000 C 0 0 0 0 0 0 0 0 0
0.0000 0.7000 0.0000 C 0 0 0 0 0 0 0 0 0
0.0000 2.1000 0.0000 C 0 0 0 0 0 0 0 0 0
1.2124 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0
1.2124 2.8000 0.0000 C 0 0 0 0 0 0 0 0 0
2.4249 0.7000 0.0000 C 0 0 0 0 0 0 0 0 0
2.4249 2.1000 0.0000 C 0 0 0 0 0 0 0 0 0
1 7 1 0 0 0 0
2 3 2 0 0 0 0
2 4 1 0 0 0 0
3 5 1 0 0 0 0
4 6 2 0 0 0 0
5 7 2 0 0 0 0
6 7 1 0 0 0 0
M END
$$$$
Similar tests worked on both Linux and Windows using the same jarfile.
There are still some issues to be addressed with this approach. For example, various reports indicate that NestedVM code runs about four to ten times slower than native execution. Benchmarking may be useful at this point.
Another issue is how to go about making a Java InChI library with NestedVM. If you decompile the jinchi.jar file, you'll find that the JInChI.class file is a large and complex beast in which almost all methods are named as hex numbers. It may be possible to create a library by renaming certain methods and breaking the code into smaller classfiles, but the NestedVM documentation seems sparse on this subject.
Despite these difficulties, this article demonstrates the power of NestedVM and describes the first (and currently only) example of a 100% Java InChI implementation.
Image Credit: smithco

