From C Source Code to Platform-Independent Executable Jarfile: Using NestedVM to Build JInChI

Posted by Rich Apodaca Mon, 03 Dec 2007 13:42:00 GMT

A recent series of articles discussed in some detail the process of compiling source code written in C and C++ to pure Java bytecode with NestedVM. But the full conversion process, starting with source and finishing with an executable jarfile, has to my knowledge never been documented. This article uses the InChI toolkit to illustrate the complete process for converting a real-world C source distribution into a platform-independent, executable jarfile that can be run with any modern Java Virtual Machine (JVM).

About InChI

The previous article in this series introduced JInChI, the first and only pure Java implementation of the IUPAC/NIST InChI toolkit. This toolkit is used to convert molecular connection tables encoded in MDL's SD File format into ASCII character strings called 'InChIs' that have a variety of applications in the field of cheminformatics. Although an excellent JNI-InChI interface is available, JNI won't be a viable option in every situation. Our pure Java implementation nicely complements the JNI-InChI library.

In this tutorial, we'll build version 1.0.2b of the InChI toolkit. This version, among other features, supports the generation of InChI Keys.

Prerequisites

This article assumes you've already installed NestedVM on your system. Building NestedVM required the installation of many dependencies and was a fairly lengthy, but straightforward, process on my Linux system.

Step 1: Prepare Your Environment

Before building anything, we'll need to set up our environment. NestedVM makes this simple:

$ cd /your/path/to/nestedvm/
$ source env.sh

Next, let's create a directory to hold the various components we'll need during the build process:

$ cd /your/projects/directory
$ mkdir jinchi
$ cd jinchi

Next, we'll download and unpack the InChI source distribution:

$ wget http://www.iupac.org/inchi/download/inchi102b.zip
$ unzip inchi102b.zip

Step 2: Cross-Compile InChI

We now have everything we need to begin cross-compiling. NestedVM uses a two-part process in which source code is first cross-compiled to a MIPS binary. That MIPS binary is then translated to Java bytecode. We start by invoking make with the appropriate cross-compiler flags (which I found by looking through the InChI Makefile):

$ make C_COMPILER=mips-unknown-elf-gcc LINKER=mips-unknown-elf-gcc

This creates a MIPS binary (cInChI-1). Unless you're running on a MIPS machine, this binary won't be executable.

$ ./cInChI-1
bash: ./cInChI-1: cannot execute binary file

We can now translate the MIPS binary into pure Java bytecode:

$ java org.ibex.nestedvm.Compiler -outfile JInChI.class JInChI cInChI-1

This produces a Java class file:

$ ll JInChI.class
-rw-r--r-- 1 rich rich 4372362 Nov 30 08:27 JInChI.class

We can verify that the classfile has been compiled correctly by running it:

$ java JInChI
InChI ver 1, Software version 1.02-beta August 2007.

Usage:
cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]]

Options:
  SNon        Exclude stereo (Default: Include Absolute stereo)
  SRel        Relative stereo

-- truncated --

We have now done something truly remarkable: we've taken a standard C source code distribution and converted it into an executable Java class file. It runs, but only because the NestedVM runtime is on our classpath (thanks to the source command we used at the beginning of the process).

What we really want is a self-contained, executable jarfile that can be run, unmodified, on any system with Java installed.

Step 3: Build the JInChI Jarfile

We begin by moving up the the root directory of our jinchi project, creating a new directory to hold our java-specific files (the JInChI.class file and the NestedVM runtime), and copying them into it:

$ cd ../../..
$ mkdir jinchi-1.0.2b.1
$ mv InChI-1-software-1-02-beta/cInChI/gcc_makefile/JInChI.class jinchi-1.0.2b.1/
$ cp -r /your/path/to/nestedvm/build/org/ jinchi-1.0.2b.1

An executable jarfile generally needs a manifest to point to the main execution class. One way to do that is to first create a manifest:

$ vi jinchi-1.0.2b.1/MANIFEST.MF

It's essential that this file end with a newline.

$ cat jinchi-1.0.2b.1/MANIFEST.MF
Main-Class: JInChI

With everything in place, we can create the jarfile:

$ cd jinchi-1.0.2b.1/
$ ls
JInChI.class  MANIFEST.MF  org/
$ jar -cfm jinchi-1.0.2b.1.jar MANIFEST.MF *
$ ls
jinchi-1.0.2b.1.jar  JInChI.class  MANIFEST.MF  org/

We've successfully converted standard C source code into a platform independent executable jarfile. But does it work?

Step 4: Test JInChI

We can confirm that the process has worked by running the jarfile (you should do this in a new shell session to verify that the jarfile is indeed independent of your NestedVM installation).

$ java -jar jinchi-1.0.2b.1.jar
InChI ver 1, Software version 1.02-beta August 2007.

Usage:
cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]]

Options:
  SNon        Exclude stereo (Default: Include Absolute stereo)
  SRel        Relative stereo

That's all there is to it! Your shiny new jarfile can be run on any system with a JVM installed. The one created here has been successfully tested on Mac OS X, Linux, and Windows.

If you'd prefer to download the JInChI jarfile, it can be obtained from SourceForge.

Conclusions

This article has illustrated in detail the process of converting a standard C source distribution into a platform-independent executable jarfile. Given the appropriate MIPS cross-compiler (many of which come with the NestedVM distribution), the same process can be repeated with code written in a variety of other languages.

You may be wondering what kind of performance hit you can expect with the approach outlined here. After all, we'd be comparing a native binary to something running on top of two abstraction layers: the NestedVM runtime and a JVM. It's not as bad as you might think, but that's a story for another time.

Image Credit: smithco

JInChI: Run InChI Anywhere Java Runs 5

Posted by Rich Apodaca Wed, 31 Oct 2007 14:59:00 GMT

Regardless of your views on Java the Programming Language, Java the Platform has a lot going for it. The ability to run the same executable on any system with a Java Virtual Machine (JVM), without recompilation, is a significant advantage in today's heterogeneous computing environment. Combine that with Java the Platform's battle-tested security model, stability and performance, and you have some compelling reasons to actually prefer that code execute on a JVM rather than bare metal.

Cheminformatics has many useful libraries, legacy and otherwise, that don't yet run on a JVM. Many of these can trace their roots back to the 1960s and 1970s and FORTRAN; others were written in C or C++ more recently. What they all have in common is that they're compiled to native binaries rather than Java bytecode.

Wouldn't it be great if this software could be easily compiled to Java bytecode instead?

A recent Depth-First article described how the InChI toolkit, an open source C library distributed by IUPAC, was successfully compiled to a Java classfile with the remarkable NestedVM library. This article describes the creation and use of a new platform-independent jarfile that runs the InChI program.

The procedure was not difficult. The two files previously released ( JInChI.class and nestedvm.jar) were combined into a single executable jarfile with a Manifest pointing to the JInChI classfile as the Main class.

The full cInChI jarfile can be downloaded here.

The jinchi.jar file can be tested from the command line:

$ java -jar jinchi.jar
InChI ver 1, Software version 1.01 release 07/21/2006.

Usage:
cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]]

Options:
  SNon        Exclude stereo (Default: Include Absolute stereo)
  SRel        Relative stereo
  SRac        Racemic stereo

[truncated]

If we wanted to process a molfile representing toluene, we'd use something like the following:

$ java -jar jinchi.jar test/toluene.mol
InChI version 1, Software version 1.01 release 07/21/2006
Opened log file 'test/toluene.mol.log'
Opened input file 'test/toluene.mol'
Opened output file 'test/toluene.mol.txt'
Opened problem file 'test/toluene.mol.prb'
Options: Mobile H Perception ON
Isotopic ON, Absolute Stereo ON
Omit undefined/unknown stereogenic centers and bonds
Full Aux. info
Input format: MOLfile
Output format: Plain text
Timeout per structure: 60.000 sec; Up to 1024 atoms per structure
End of file detected after structure #1.
Finished processing 1 structure: 0 errors, processing time 0:00:00.00

This command would produce the following output file, just like the cInChI program:

$ cat test/toluene.mol.txt
* Input_File: "test/toluene.mol"
Structure: 1
InChI=1/C7H8/c1-7-5-3-2-4-6-7/h2-6H,1H3
AuxInfo=1/0/N:1,2,3,4,5,6,7/E:(3,4)(5,6)/rA:7nCCCCCCC/rB:;d2;s2;s3;d4;s1d5s6;/rC:3.6373,2.8,0;0,.7,0;0,2.1,0;1.2124,0,0;1.2124,2.8,0;2.4249,.7,0;2.4249,2.1,0;

We can also convert InChIs into molfiles (command line options work the same as in cInChI):

$ java -jar jinchi.jar test/toluene.mol.txt -OutputSDF
InChI version 1, Software version 1.01 release 07/21/2006
Opened log file 'test/toluene.mol.txt.log'
Opened input file 'test/toluene.mol.txt'
Opened output file 'test/toluene.mol.txt.txt'
Opened problem file 'test/toluene.mol.txt.prb'
Options: Output SDfile only
End of file detected after structure #1.
Finished processing 1 structure: 0 errors, processing time 0:00:00.00

In this case the output is:

$ cat test/toluene.mol.txt.txt
Structure #1
  InChI v1 SDfile Output

  7  7  0  0  0  0  0  0  0  0  1 V2000
    3.6373    2.8000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.7000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    2.1000    0.0000 C   0  0  0     0  0  0  0  0  0
    1.2124    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    1.2124    2.8000    0.0000 C   0  0  0     0  0  0  0  0  0
    2.4249    0.7000    0.0000 C   0  0  0     0  0  0  0  0  0
    2.4249    2.1000    0.0000 C   0  0  0     0  0  0  0  0  0
  1  7  1  0  0  0  0
  2  3  2  0  0  0  0
  2  4  1  0  0  0  0
  3  5  1  0  0  0  0
  4  6  2  0  0  0  0
  5  7  2  0  0  0  0
  6  7  1  0  0  0  0
M  END
$$$$

Similar tests worked on both Linux and Windows using the same jarfile.

There are still some issues to be addressed with this approach. For example, various reports indicate that NestedVM code runs about four to ten times slower than native execution. Benchmarking may be useful at this point.

Another issue is how to go about making a Java InChI library with NestedVM. If you decompile the jinchi.jar file, you'll find that the JInChI.class file is a large and complex beast in which almost all methods are named as hex numbers. It may be possible to create a library by renaming certain methods and breaking the code into smaller classfiles, but the NestedVM documentation seems sparse on this subject.

Despite these difficulties, this article demonstrates the power of NestedVM and describes the first (and currently only) example of a 100% Java InChI implementation.

Image Credit: smithco

Compiling the InChI Toolkit to Pure Java Bytecode with NestedVM 6

Posted by Rich Apodaca Mon, 29 Oct 2007 15:13:00 GMT

Some time ago, a Depth-First article discussed some methods for compiling C to Java bytecode. Many factors make this approach attractive compared to the JNI approach. Some of them include security, portability, and use within applets. Unfortunately, none of the approaches discussed in the earlier article seemed particularly general.

Many cheminformatics libraries are written in C and C++; being able to reliably and automatically port them to Java could potentially save a great deal of effort.

One of the more important cheminformatics C libraries written in recent years is the InChI toolkit. With no pure Java port of this library, JNI is the only way to use InChI with Java. In some situations, this approach is either overly complicated or simply unacceptable.

All of this leaves us with the question: how can the InChI toolkit be converted into a pure Java library without writing any new code?

A partial answer to this question came from Evan Jones, who suggested I look at NestedVM. From the website:

NestedVM provides binary translation for Java Bytecode. This is done by having GCC compile to a MIPS binary which is then translated to a Java class file. Hence any application written in C, C++, Fortran, or any other language supported by GCC can be run in 100% pure Java with no source changes.

And it worked!

NestedVM was successfully used compile the InChI-API distribution to a Java class file that executed on nothing more than a standard JVM -- and with no JNI code. The InChI classfile and nestedvm runtime jarfile can be downloaded from SourceForge. Future articles in this series will describe the compilation, installation, and use of NestedVM, as well as the Java class file that it produced.

Image Credit: pmorgan

Easily Convert IUPAC Nomenclature to SMILES, InChI, or Molfile with Rubidium

Posted by Rich Apodaca Fri, 19 Oct 2007 14:05:00 GMT

A recent article introduced Rubidium, a cheminformatics toolkit written in Ruby. One of Ruby's strengths is the speed with which it enables disparate pieces of code to be glued together - even if they're written in different programming languages. In this article, we'll see how Rubidium can be extended to provide support for converting IUPAC nomenclature into SMILES, InChI, or Molfile formats.

About Rubidium

Rubidium is a cheminformatics toolkit written in Ruby. Rubidium is currently configured to run on JRuby, although future versions may also work with Matz' Ruby Implementation) (MRI) via Ruby Java Bridge.

Rubidium will eventually be packaged as a RubyGem and hosted on RubyForge. For now, the toolkit consists of a running library that will updated and documented on this blog.

The Library

The library extends the CDK module presented in the previous article in this series. The main change is the addition of an IUPACReader class, based on Peter Corbett's excellent OPSIN library:

class IUPACReader
  import 'java.io.StringReader'
  import 'uk.ac.cam.ch.wwmm.opsin.NameToStructure'
  import 'org.openscience.cdk.io.CMLReader'
  import 'org.openscience.cdk.ChemFile'

  def initialize
    @iupac_reader = NameToStructure.new
    @cml_reader = CMLReader.new
  end

  def read name
    cml = @iupac_reader.parse_to_cml(name)

    raise "Could not parse '#{name}'." unless cml

    @cml_reader.set_reader StringReader.new(cml.to_xml)

    chem_file = @cml_reader.read ChemFile.new

    chem_file.chem_sequence(0).chem_model(0).molecule_set.molecule(0)
  end
end

Using this additional functionality requires nothing more than copying the OPSIN jarfile into the lib directory of your JRuby installation. You'll also need to place the CDK jarfile in this directory if you haven't done so already.

The complete Rubidium library can be downloaded here.

A Test

We can test Rubidium's IUPAC nomenclature parsing abilities with jirb. For example, to convert from name to SMILES:

$ jirb
irb(main):001:0> require 'cdk'
=> true
irb(main):002:0> c=CDK::Conversion.new
=> #<CDK::Conversion:0x46ca65 ... >
irb(main):003:0> c.set_formats 'iupac', 'smi'
=> "smi"
irb(main):004:0> c.convert '1,4-dichlorobenzene'
=> "C=1C=C(C=CC=1Cl)Cl"

To convert from name to InChI (in the same jirb session):

irb(main):005:0> c.set_out_format 'inchi'
=> "inchi"
irb(main):006:0> c.convert '1,4-dichlorobenzene'
=> "InChI=1/C6H4Cl2/c7-5-1-2-6(8)4-3-5/h1-4H"

And to convert from name to Molfile (also in the same jirb session):

irb(main):007:0> c.set_out_format 'mol'
=> "mol"
irb(main):008:0> c.convert '1,4-dichlorobenzene'
=> "\n  CDK    10/19/07,7:59\n\n  8  8  0  0  0  0  0  0  0  0999 V2000\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n  1  2  2  0  0  0  0 \n  2  3  1  0  0  0  0 \n  3  4  2  0  0  0  0 \n  4  5  1  0  0  0  0 \n  5  6  2  0  0  0  0 \n  6  1  1  0  0  0  0 \n  7  1  1  0  0  0  0 \n  8  4  1  0  0  0  0 \nM  END\n"

Conclusions

By re-using a simple conversion API together with another Java library, we've given Rubidium the ability to translate IUPAC nomenclature into other molecular languages. The additional code was both easy to write and easy to test. Future articles will discuss the packaging, distribution, and further elaboration of Rubidium.

JRuby for Cheminformatics: Reading and Writing InChIs Via the Java Native Interface 2

Posted by Rich Apodaca Wed, 10 Oct 2007 12:21:00 GMT

The increased use of the InChI identifier is making the reading and writing of InChIs a standard cheminformatics capability. Recent articles have discussed the advantages of JRuby for cheminformatics. One disadvantage of JRuby is that code written in C can't be directly used. The presents a potential problem for libraries, such as the InChI toolkit, that are written in C. Fortunately, the solution is simple. Today's tutorial will demonstrate how InChIs can be both read and written using the C-InChI toolkit via JRuby and the excellent JNI-InChI library.

About JNI-InChI

The JNI-InChI library, written by Jim Downing and Sam Adams, wraps the C InChI toolkit in a Java Native Interface. This low-level toolkit is suitable for building more complex software, but lacks many features present in the C InChI toolkit. For example, JNI-InChI doesn't directly interconvert SMILES or molfile with InChI. For that you'd need to build a support library. If you're building a toolkit from scratch, this lightweight approach can be a significant advantage.

The JNI-InChI binary distribution jarfile includes the compiled native InChI library. In this sense it's virtually indistinguishable from any other Java library. This simplified packaging makes it exceptionally easy to use JNI-InChI from JRuby, as we'll see below.

Installation

JRuby can be installed as described previously. To install the JNI-InChI library for JRuby, simply copy the current release jarfile into the lib directory of your JRuby installation. That's all there is to it.

A Simple Library

We can now write a simple library to read InChIs via JRuby:

require 'java'

include_class 'net.sf.jniinchi.JniInchiInput'
include_class 'net.sf.jniinchi.JniInchiInputInchi'
include_class 'net.sf.jniinchi.JniInchiWrapper'

module IUPAC
  def read_inchi inchi
    input = JniInchiInputInchi.new inchi

    JniInchiWrapper.getStructureFromInchi input
  end
end

Testing the Library

By saving the above library to a file called iupac.rb, we can parse InChIs via JRuby:

$ jirb
irb(main):001:0> require 'iupac'
=> true
irb(main):002:0> include IUPAC
=> Object
irb(main):003:0> output = read_inchi 'InChI=1/C14H10/c1-3-7-13-11(5-1)9-10-12-6-2-4-8-14(12)13/h1-10H'
=> #
irb(main):004:0> output.num_atoms
=> 14
irb(main):005:0> output.num_bonds
=> 16

Writing InChIs

Because JNI-InChI is a low-level toolkit, writing InChIs is feasible, but not trivial. We must first construct a representation, and then get the InChI for it. For example, we could get the InChI for methane as follows:

$ jirb
irb(main):001:0> require 'java'
=> true
irb(main):002:0> include_class 'net.sf.jniinchi.JniInchiInput'
=> ["net.sf.jniinchi.JniInchiInput"]
irb(main):003:0> include_class 'net.sf.jniinchi.JniInchiAtom'
=> ["net.sf.jniinchi.JniInchiAtom"]
irb(main):004:0> include_class 'net.sf.jniinchi.JniInchiWrapper'
=> ["net.sf.jniinchi.JniInchiWrapper"]
irb(main):005:0> input = JniInchiInput.new
=> #
irb(main):006:0> a1 = input.add_atom JniInchiAtom.new(0,0,0, "C")
=> #
irb(main):007:0> a1.set_implicit_h(4)
=> nil
irb(main):008:0> output = JniInchiWrapper.get_inchi input
=> #
irb(main):009:0> output.get_inchi
=> "InChI=1/CH4/h1H4"

Fortunately, we don't have to work that hard. The Chemistry Development Kit, through JNI-InChI, supports reading and writing of InChIs via a variety of molecular languages, including SMILES and molfile. More on that later, though.

Conclusions

Provided that a Java Native Interface exists for a C library, it can be used from JRuby. Future articles will discuss the use of other cheminformatics libraries written in either C or C++ from JRuby, and their integration with pure Java and Ruby libraries.

Older posts: 1 2 3 ... 8