Extending InChI Stereochemistry 6

Posted by Rich Apodaca Wed, 09 Jul 2008 10:18:00 GMT

As covered by Reuters and many other wire services, ArtusLabs and Boston University's CMLD have teamed up to extend InChI's stereochemistry support:

DURHAM, N.C.--(Business Wire)-- ArtusLabs, Inc., a leading provider of life science software tools and data management solutions, has entered into a partnership with Boston University's Center for Chemical Methodology and Library Development (CMLD) to develop a way to standardize and expand the way in which stereochemistry, and ultimately a three-dimensional structures, are represented in the International Chemical Identifier (InChI(TM)).

With the increasing use of molecules containing axial chirality , planar chirality and other forms of non-tetrahedral stereogenicity in chemistry, the move by ArtusLabs and CMLD could be significant.

Put simply, the ability of cheminformatics to represent certain kinds of compounds has fallen way behind the ability of chemistry to make them. While molecules once considered mere oddities 30 years ago continue to pour into corporate compound collections, laboratory notebooks, and product catalogs, cheminformatics has been stuck with a form of molecular representation that hasn't changed significantly in several decades.

InChI isn't alone. All three of the most widely-used molecular representation systems now in use (Molfile, SMILES, and CML) suffer from fundamental limitations in representing axial chirality, planar chirality, and multicenter bonding.

The kind of work being undertaken by ArtusLabs and CMLD is essential if cheminformatics is to continue to keep pace with new developments in chemistry.

A Simple and Portable Ruby Interface to InChI - Part 2: Silencing Console Output

Posted by Rich Apodaca Fri, 30 May 2008 10:04:00 GMT

The previous article in this series described a simple and portable method for interfacing Ruby to the cInChI-1 binary. One disadvantage was noisy console output. This article offers a minor modification to disable it.

The Code

module InChI
  def inchi_for molfile
    output = %x[echo "#{molfile}" | cInChI-1 -STDIO 2>/dev/null]

    output.eql?("") ? "" : output.split(/\n/)[1]
  end
end

Here, we're taking advantage of the ability to redirect certain output streams to /dev/null.

Testing the Code

Saving the above in a file called inchi.rb, we can test it from IRB. To make things interesting, let's pull a molfile from Chempedia:

$ irb
irb(main):001:0> require 'open-uri'
=> true
irb(main):002:0> require 'inchi'
=> true
irb(main):003:0> include InChI
=> Object
irb(main):004:0> open 'http://chempedia.com/compounds/83490.mol' do |f|
irb(main):005:1*   puts inchi_for(f.read)
irb(main):006:1> end
InChI=1/C15H15NO3S/c17-14(16-18)11-20(19)15(12-7-3-1-4-8-12)13-9-5-2-6-10-13/h1-10,15,18H,11H2,(H,16,17)
=> nil

We should be able to run this code unmodified on any UNIX-like system in which the cInChI-1 binary is on the path. And of course we could take this one step further by allowing command line options to be passed in as parameters to the inchi_for method.

Simplicity has its advantages.

A Simple and Portable Ruby Interface to InChI 8

Posted by Rich Apodaca Thu, 29 May 2008 12:12:00 GMT

Although the InChI software itself is written in C, it can still be used via Ruby. Rino offers one implementation of a Ruby InChI interface that makes use of a C extension. This article describes a more concise and portable solution.

The Code

The following code will accept a String encoding a molfile and return either its InChI, or an empty String if no InChI could be found:

module InChI
  def inchi_for molfile
    output = %x[echo "#{molfile}" | cInChI-1 -STDIO]

    output.eql?("") ? "" : output.split(/\n/)[1]
  end
end

This code takes advantage of Ruby's built-in support for Command Expansion.

Testing the Code

The code below tests the library:

require 'inchi'
include InChI

molfile =
"http://chempedia.com/compounds/106.mol
  -OEChem-03010811072D

 12 12  0     0  0  0  0  0  0999 V2000
    2.8660    1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.0000    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.0000   -0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321   -0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660   -1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660    1.6200    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.4631    0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.2690    0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.4631   -0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.2690   -0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660   -1.6200    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  1  3  1  0  0  0  0
  1  7  1  0  0  0  0
  2  4  1  0  0  0  0
  2  8  1  0  0  0  0
  3  5  2  0  0  0  0
  3  9  1  0  0  0  0
  4  6  2  0  0  0  0
  4 10  1  0  0  0  0
  5  6  1  0  0  0  0
  5 11  1  0  0  0  0
  6 12  1  0  0  0  0
M  END"

puts "Found InChI: #{inchi_for(molfile)}"

We can run the test by saving it in a file called test.rb and executing it:

$ ruby test.rb
InChI version 1, Software version 1.02-beta August 2007
Log file not specified. Using standard error output.
Input file not specified. Using standard input.
Output file not specified. Using standard output.
Options: Mobile H Perception ON
Isotopic ON, Absolute Stereo ON
Omit undefined/unknown stereogenic centers and bonds
Full Aux. info
Input format: MOLfile
Output format: Plain text
Timeout per structure: 60.000 sec; Up to 1024 atoms per structure
End of file detected after structure #1.
Finished processing 1 structure: 0 errors, processing time 0:00:00.00
Found InChI: InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H

Prerequisites

The above approach only requires that it be run on a UNIX-like system, and that a copy of the InChI library be present on your path.

Advantages

The approach described here offers some important advantages over Rino:

Disadvantages

This approach creates a lot of noisy log output to the console. There must be a way to suppress it, but so far I haven't found out how.

Conclusions

Using Ruby's support for Command Expansions has enabled the creation of a concise and portable Ruby interface to the InChI toolkit. Similar principles would apply to any Unix command-line binary, including for example, Open Babel.

From C Source Code to Platform-Independent Executable Jarfile: Using NestedVM to Build JInChI

Posted by Rich Apodaca Mon, 03 Dec 2007 08:42:00 GMT

A recent series of articles discussed in some detail the process of compiling source code written in C and C++ to pure Java bytecode with NestedVM. But the full conversion process, starting with source and finishing with an executable jarfile, has to my knowledge never been documented. This article uses the InChI toolkit to illustrate the complete process for converting a real-world C source distribution into a platform-independent, executable jarfile that can be run with any modern Java Virtual Machine (JVM).

About InChI

The previous article in this series introduced JInChI, the first and only pure Java implementation of the IUPAC/NIST InChI toolkit. This toolkit is used to convert molecular connection tables encoded in MDL's SD File format into ASCII character strings called 'InChIs' that have a variety of applications in the field of cheminformatics. Although an excellent JNI-InChI interface is available, JNI won't be a viable option in every situation. Our pure Java implementation nicely complements the JNI-InChI library.

In this tutorial, we'll build version 1.0.2b of the InChI toolkit. This version, among other features, supports the generation of InChI Keys.

Prerequisites

This article assumes you've already installed NestedVM on your system. Building NestedVM required the installation of many dependencies and was a fairly lengthy, but straightforward, process on my Linux system.

Step 1: Prepare Your Environment

Before building anything, we'll need to set up our environment. NestedVM makes this simple:

$ cd /your/path/to/nestedvm/
$ source env.sh

Next, let's create a directory to hold the various components we'll need during the build process:

$ cd /your/projects/directory
$ mkdir jinchi
$ cd jinchi

Next, we'll download and unpack the InChI source distribution:

$ wget http://www.iupac.org/inchi/download/inchi102b.zip
$ unzip inchi102b.zip

Step 2: Cross-Compile InChI

We now have everything we need to begin cross-compiling. NestedVM uses a two-part process in which source code is first cross-compiled to a MIPS binary. That MIPS binary is then translated to Java bytecode. We start by invoking make with the appropriate cross-compiler flags (which I found by looking through the InChI Makefile):

$ make C_COMPILER=mips-unknown-elf-gcc LINKER=mips-unknown-elf-gcc

This creates a MIPS binary (cInChI-1). Unless you're running on a MIPS machine, this binary won't be executable.

$ ./cInChI-1
bash: ./cInChI-1: cannot execute binary file

We can now translate the MIPS binary into pure Java bytecode:

$ java org.ibex.nestedvm.Compiler -outfile JInChI.class JInChI cInChI-1

This produces a Java class file:

$ ll JInChI.class
-rw-r--r-- 1 rich rich 4372362 Nov 30 08:27 JInChI.class

We can verify that the classfile has been compiled correctly by running it:

$ java JInChI
InChI ver 1, Software version 1.02-beta August 2007.

Usage:
cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]]

Options:
  SNon        Exclude stereo (Default: Include Absolute stereo)
  SRel        Relative stereo

-- truncated --

We have now done something truly remarkable: we've taken a standard C source code distribution and converted it into an executable Java class file. It runs, but only because the NestedVM runtime is on our classpath (thanks to the source command we used at the beginning of the process).

What we really want is a self-contained, executable jarfile that can be run, unmodified, on any system with Java installed.

Step 3: Build the JInChI Jarfile

We begin by moving up the the root directory of our jinchi project, creating a new directory to hold our java-specific files (the JInChI.class file and the NestedVM runtime), and copying them into it:

$ cd ../../..
$ mkdir jinchi-1.0.2b.1
$ mv InChI-1-software-1-02-beta/cInChI/gcc_makefile/JInChI.class jinchi-1.0.2b.1/
$ cp -r /your/path/to/nestedvm/build/org/ jinchi-1.0.2b.1

An executable jarfile generally needs a manifest to point to the main execution class. One way to do that is to first create a manifest:

$ vi jinchi-1.0.2b.1/MANIFEST.MF

It's essential that this file end with a newline.

$ cat jinchi-1.0.2b.1/MANIFEST.MF
Main-Class: JInChI

With everything in place, we can create the jarfile:

$ cd jinchi-1.0.2b.1/
$ ls
JInChI.class  MANIFEST.MF  org/
$ jar -cfm jinchi-1.0.2b.1.jar MANIFEST.MF *
$ ls
jinchi-1.0.2b.1.jar  JInChI.class  MANIFEST.MF  org/

We've successfully converted standard C source code into a platform independent executable jarfile. But does it work?

Step 4: Test JInChI

We can confirm that the process has worked by running the jarfile (you should do this in a new shell session to verify that the jarfile is indeed independent of your NestedVM installation).

$ java -jar jinchi-1.0.2b.1.jar
InChI ver 1, Software version 1.02-beta August 2007.

Usage:
cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]]

Options:
  SNon        Exclude stereo (Default: Include Absolute stereo)
  SRel        Relative stereo

That's all there is to it! Your shiny new jarfile can be run on any system with a JVM installed. The one created here has been successfully tested on Mac OS X, Linux, and Windows.

If you'd prefer to download the JInChI jarfile, it can be obtained from SourceForge.

Conclusions

This article has illustrated in detail the process of converting a standard C source distribution into a platform-independent executable jarfile. Given the appropriate MIPS cross-compiler (many of which come with the NestedVM distribution), the same process can be repeated with code written in a variety of other languages.

You may be wondering what kind of performance hit you can expect with the approach outlined here. After all, we'd be comparing a native binary to something running on top of two abstraction layers: the NestedVM runtime and a JVM. It's not as bad as you might think, but that's a story for another time.

Image Credit: smithco

JInChI: Run InChI Anywhere Java Runs 5

Posted by Rich Apodaca Wed, 31 Oct 2007 10:59:00 GMT

Regardless of your views on Java the Programming Language, Java the Platform has a lot going for it. The ability to run the same executable on any system with a Java Virtual Machine (JVM), without recompilation, is a significant advantage in today's heterogeneous computing environment. Combine that with Java the Platform's battle-tested security model, stability and performance, and you have some compelling reasons to actually prefer that code execute on a JVM rather than bare metal.

Cheminformatics has many useful libraries, legacy and otherwise, that don't yet run on a JVM. Many of these can trace their roots back to the 1960s and 1970s and FORTRAN; others were written in C or C++ more recently. What they all have in common is that they're compiled to native binaries rather than Java bytecode.

Wouldn't it be great if this software could be easily compiled to Java bytecode instead?

A recent Depth-First article described how the InChI toolkit, an open source C library distributed by IUPAC, was successfully compiled to a Java classfile with the remarkable NestedVM library. This article describes the creation and use of a new platform-independent jarfile that runs the InChI program.

The procedure was not difficult. The two files previously released ( JInChI.class and nestedvm.jar) were combined into a single executable jarfile with a Manifest pointing to the JInChI classfile as the Main class.

The full cInChI jarfile can be downloaded here.

The jinchi.jar file can be tested from the command line:

$ java -jar jinchi.jar
InChI ver 1, Software version 1.01 release 07/21/2006.

Usage:
cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]]

Options:
  SNon        Exclude stereo (Default: Include Absolute stereo)
  SRel        Relative stereo
  SRac        Racemic stereo

[truncated]

If we wanted to process a molfile representing toluene, we'd use something like the following:

$ java -jar jinchi.jar test/toluene.mol
InChI version 1, Software version 1.01 release 07/21/2006
Opened log file 'test/toluene.mol.log'
Opened input file 'test/toluene.mol'
Opened output file 'test/toluene.mol.txt'
Opened problem file 'test/toluene.mol.prb'
Options: Mobile H Perception ON
Isotopic ON, Absolute Stereo ON
Omit undefined/unknown stereogenic centers and bonds
Full Aux. info
Input format: MOLfile
Output format: Plain text
Timeout per structure: 60.000 sec; Up to 1024 atoms per structure
End of file detected after structure #1.
Finished processing 1 structure: 0 errors, processing time 0:00:00.00

This command would produce the following output file, just like the cInChI program:

$ cat test/toluene.mol.txt
* Input_File: "test/toluene.mol"
Structure: 1
InChI=1/C7H8/c1-7-5-3-2-4-6-7/h2-6H,1H3
AuxInfo=1/0/N:1,2,3,4,5,6,7/E:(3,4)(5,6)/rA:7nCCCCCCC/rB:;d2;s2;s3;d4;s1d5s6;/rC:3.6373,2.8,0;0,.7,0;0,2.1,0;1.2124,0,0;1.2124,2.8,0;2.4249,.7,0;2.4249,2.1,0;

We can also convert InChIs into molfiles (command line options work the same as in cInChI):

$ java -jar jinchi.jar test/toluene.mol.txt -OutputSDF
InChI version 1, Software version 1.01 release 07/21/2006
Opened log file 'test/toluene.mol.txt.log'
Opened input file 'test/toluene.mol.txt'
Opened output file 'test/toluene.mol.txt.txt'
Opened problem file 'test/toluene.mol.txt.prb'
Options: Output SDfile only
End of file detected after structure #1.
Finished processing 1 structure: 0 errors, processing time 0:00:00.00

In this case the output is:

$ cat test/toluene.mol.txt.txt
Structure #1
  InChI v1 SDfile Output

  7  7  0  0  0  0  0  0  0  0  1 V2000
    3.6373    2.8000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.7000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    2.1000    0.0000 C   0  0  0     0  0  0  0  0  0
    1.2124    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    1.2124    2.8000    0.0000 C   0  0  0     0  0  0  0  0  0
    2.4249    0.7000    0.0000 C   0  0  0     0  0  0  0  0  0
    2.4249    2.1000    0.0000 C   0  0  0     0  0  0  0  0  0
  1  7  1  0  0  0  0
  2  3  2  0  0  0  0
  2  4  1  0  0  0  0
  3  5  1  0  0  0  0
  4  6  2  0  0  0  0
  5  7  2  0  0  0  0
  6  7  1  0  0  0  0
M  END
$$$$

Similar tests worked on both Linux and Windows using the same jarfile.

There are still some issues to be addressed with this approach. For example, various reports indicate that NestedVM code runs about four to ten times slower than native execution. Benchmarking may be useful at this point.

Another issue is how to go about making a Java InChI library with NestedVM. If you decompile the jinchi.jar file, you'll find that the JInChI.class file is a large and complex beast in which almost all methods are named as hex numbers. It may be possible to create a library by renaming certain methods and breaking the code into smaller classfiles, but the NestedVM documentation seems sparse on this subject.

Despite these difficulties, this article demonstrates the power of NestedVM and describes the first (and currently only) example of a 100% Java InChI implementation.

Image Credit: smithco

Older posts: 1 2 3 ... 9