Run Babel Anywhere Java Runs with JBabel 6
A recent series of D-F articles have discussed the use of NestedVM to compile cheminformatics programs written in C/C++ to pure java binaries that can be run on any system with a JVM. More specifically, an attempt to compile OpenBabel's babel program to bytecode was only partially successful. With the help of Geoff Hutchison, the problem was resolved. This article introduces JBabel, a platform-independent, pure Java implementation of OpenBabel's babel program.
A Little About JBabel
JBabel was compiled from the Open Babel 2.1.1 source release and can be downloaded from SourceForge. The same jarfile was successfully tested on Linux, Windows and Mac OS X. You can verify JBabel works on your platform with the following command:
$ java -jar jbabel-20071209.jar -Hsmi smi SMILES format A linear text format which can describe the connectivity and chirality of a molecule Write Options e.g. -xt n no molecule name t molecule name only r radicals lower case eg ethyl is Cc
This version of JBabel was compiled with support for three formats:
SMILES (smi). Non-canonical SMILES.
MDL (mol). Molfiles and SD Files.
Canonical SMILES (can). Canonical SMILES implementation donated by eMolecules.
I'll discuss exactly how support for these formats was added in a subsequent post. More formats will be added in the future. For now, let's just try JBabel out.
Testing JBabel
One way to use JBabel is interactively from the command line - just leave out an input or output file parameter. For example, if you wanted to get the eMolecules canonical SMILES for sertraline, you might do something like this (be sure to use two returns to begin processing):
$ java -jar jbabel-20071209.jar -ismi -ocan CN[C@H]1CC[C@H](C2=CC=CC=C12)C3=CC(=C(C=C3)Cl)Cl CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12 1 molecule converted 34 audit log messages
This canonical SMILES can be converted into a molfile with the following:
$ java -jar jbabel-20071209.jar -ismi -omol
CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12
OpenBabel12090723182D
22 24 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0
...
To convert using input and output files, we could use a medium-sized dataset such as the PubChem benzodiazepine dataset prepared for Rubidium:
$ java -jar jbabel-20071209.jar -imol pubchem_benzodiazepine_20071110.sdf -ocan pubchem_benzodiazepine_20071110.smi ============================== *** Open Babel Warning in ReadMolecule WARNING: Problems reading a MDL file Cannot read title line 2117 molecules converted
This test, which parses 2117 records, required four minutes forty-five seconds on my system. For comparison, the natively compiled binary did the same thing in about thirteen seconds. Clearly, the JBabel performance hit is substantial.
Uses
Although it's very unlikely that JBabel will ever be useful in performance-critical situations, its portability makes it attractive for other uses. Examples include:
application development in heterogeneous computing environments;
use on systems in which native compilation may be difficult, such as those with unusual configurations or operating systems;
cases in which native binaries work poorly or not at all, such as in applets and Java applications;
situations in which performance is a minor consideration, such as in end-user applications that process only a few molecules at a time, or during application prototyping
Conclusions
This article has described JBabel, the first portable binary version of OpenBabel's babel molecular file format interconversion program. The next article in this series will describe in detail the steps that were used to compile it.
From C Source Code to Platform-Independent Executable Jarfile: Using NestedVM to Build JInChI
A recent series of articles discussed in some detail the process of compiling source code written in C and C++ to pure Java bytecode with NestedVM. But the full conversion process, starting with source and finishing with an executable jarfile, has to my knowledge never been documented. This article uses the InChI toolkit to illustrate the complete process for converting a real-world C source distribution into a platform-independent, executable jarfile that can be run with any modern Java Virtual Machine (JVM).
About InChI
The previous article in this series introduced JInChI, the first and only pure Java implementation of the IUPAC/NIST InChI toolkit. This toolkit is used to convert molecular connection tables encoded in MDL's SD File format into ASCII character strings called 'InChIs' that have a variety of applications in the field of cheminformatics. Although an excellent JNI-InChI interface is available, JNI won't be a viable option in every situation. Our pure Java implementation nicely complements the JNI-InChI library.
In this tutorial, we'll build version 1.0.2b of the InChI toolkit. This version, among other features, supports the generation of InChI Keys.
Prerequisites
This article assumes you've already installed NestedVM on your system. Building NestedVM required the installation of many dependencies and was a fairly lengthy, but straightforward, process on my Linux system.
Step 1: Prepare Your Environment
Before building anything, we'll need to set up our environment. NestedVM makes this simple:
$ cd /your/path/to/nestedvm/ $ source env.sh
Next, let's create a directory to hold the various components we'll need during the build process:
$ cd /your/projects/directory $ mkdir jinchi $ cd jinchi
Next, we'll download and unpack the InChI source distribution:
$ wget http://www.iupac.org/inchi/download/inchi102b.zip $ unzip inchi102b.zip
Step 2: Cross-Compile InChI
We now have everything we need to begin cross-compiling. NestedVM uses a two-part process in which source code is first cross-compiled to a MIPS binary. That MIPS binary is then translated to Java bytecode. We start by invoking make with the appropriate cross-compiler flags (which I found by looking through the InChI Makefile):
$ make C_COMPILER=mips-unknown-elf-gcc LINKER=mips-unknown-elf-gcc
This creates a MIPS binary (cInChI-1). Unless you're running on a MIPS machine, this binary won't be executable.
$ ./cInChI-1 bash: ./cInChI-1: cannot execute binary file
We can now translate the MIPS binary into pure Java bytecode:
$ java org.ibex.nestedvm.Compiler -outfile JInChI.class JInChI cInChI-1
This produces a Java class file:
$ ll JInChI.class -rw-r--r-- 1 rich rich 4372362 Nov 30 08:27 JInChI.class
We can verify that the classfile has been compiled correctly by running it:
$ java JInChI InChI ver 1, Software version 1.02-beta August 2007. Usage: cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]] Options: SNon Exclude stereo (Default: Include Absolute stereo) SRel Relative stereo -- truncated --
We have now done something truly remarkable: we've taken a standard C source code distribution and converted it into an executable Java class file. It runs, but only because the NestedVM runtime is on our classpath (thanks to the source command we used at the beginning of the process).
What we really want is a self-contained, executable jarfile that can be run, unmodified, on any system with Java installed.
Step 3: Build the JInChI Jarfile
We begin by moving up the the root directory of our jinchi project, creating a new directory to hold our java-specific files (the JInChI.class file and the NestedVM runtime), and copying them into it:
$ cd ../../.. $ mkdir jinchi-1.0.2b.1 $ mv InChI-1-software-1-02-beta/cInChI/gcc_makefile/JInChI.class jinchi-1.0.2b.1/ $ cp -r /your/path/to/nestedvm/build/org/ jinchi-1.0.2b.1
An executable jarfile generally needs a manifest to point to the main execution class. One way to do that is to first create a manifest:
$ vi jinchi-1.0.2b.1/MANIFEST.MF
It's essential that this file end with a newline.
$ cat jinchi-1.0.2b.1/MANIFEST.MF Main-Class: JInChI
With everything in place, we can create the jarfile:
$ cd jinchi-1.0.2b.1/ $ ls JInChI.class MANIFEST.MF org/ $ jar -cfm jinchi-1.0.2b.1.jar MANIFEST.MF * $ ls jinchi-1.0.2b.1.jar JInChI.class MANIFEST.MF org/
We've successfully converted standard C source code into a platform independent executable jarfile. But does it work?
Step 4: Test JInChI
We can confirm that the process has worked by running the jarfile (you should do this in a new shell session to verify that the jarfile is indeed independent of your NestedVM installation).
$ java -jar jinchi-1.0.2b.1.jar InChI ver 1, Software version 1.02-beta August 2007. Usage: cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]] Options: SNon Exclude stereo (Default: Include Absolute stereo) SRel Relative stereo
That's all there is to it! Your shiny new jarfile can be run on any system with a JVM installed. The one created here has been successfully tested on Mac OS X, Linux, and Windows.
If you'd prefer to download the JInChI jarfile, it can be obtained from SourceForge.
Conclusions
This article has illustrated in detail the process of converting a standard C source distribution into a platform-independent executable jarfile. Given the appropriate MIPS cross-compiler (many of which come with the NestedVM distribution), the same process can be repeated with code written in a variety of other languages.
You may be wondering what kind of performance hit you can expect with the approach outlined here. After all, we'd be comparing a native binary to something running on top of two abstraction layers: the NestedVM runtime and a JVM. It's not as bad as you might think, but that's a story for another time.
Image Credit: smithco
Compiling Open Babel to Pure Java Bytecode with NestedVM: Building A Runnable Classfile that Almost Works 2
Previously, I described an unsuccessful first attempt to compile the popular cheminformatics C/C++ library Open Babel to pure Java bytecode using NestedVM. This article follows that topic one step further, and shows how to obtain a runnable Java classfile. Although major functionality is missing, the principle of compiling arbitrary C/C++ code to both Java source code and Java bytecode is illustrated.
Getting Started
This articles assumes that you've installed NestedVM and downloaded Open Babel on your system. You'll then need to set up your environment (from the nestedvm installation directory):
$ source env.sh
Run the Configure Script
The configure script we used last time didn't attempt to statically compile the binary utilities in the tools directory. This time, we'll add flags to allow this:
$ ./configure --disable-dynamic-modules --enable-static=yes --enable-shared=no --enable-inchi --host=mips-unknown-elf $ make
Note: leaving out the static compile directives does not produce a fully-functioning classfile either.
Next, we'll attempt to directly create the babel binary in Java classfile format, as we did last time:
$ cd tools
$ java org.ibex.nestedvm.Compiler -outfile Babel.class Babel babel
Exception in thread "main" java.lang.IllegalStateException: unresolved phantom target
at org.ibex.classgen.MethodGen.resolveTarget(MethodGen.java:555)
at org.ibex.classgen.MethodGen._generateCode(MethodGen.java:664)
at org.ibex.classgen.MethodGen.generateCode(MethodGen.java:618)
at org.ibex.classgen.MethodGen.dump(MethodGen.java:888)
at org.ibex.classgen.ClassFile._dump(ClassFile.java:193)
at org.ibex.classgen.ClassFile.dump(ClassFile.java:160)
at org.ibex.nestedvm.ClassFileCompiler.__go(ClassFileCompiler.java:380)
at org.ibex.nestedvm.ClassFileCompiler._go(ClassFileCompiler.java:72)
at org.ibex.nestedvm.Compiler.go(Compiler.java:259)
at org.ibex.nestedvm.Compiler.main(Compiler.java:183)
We're getting the same error as before. Although, an announcement of a bugfix was posted to the NestedVM list, in my hands the new version of NestedVM caused the same error.
As a workaround, we can compile to Java sourcecode first:
$ java org.ibex.nestedvm.Compiler -outformat java -outfile Babel.java Babel babel
We now have a Java source file encoding the babel program. Does it compile?
$ javac Babel.java
The system is out of resources.
Consult the following stack trace for details.
java.lang.OutOfMemoryError: Java heap space
at com.sun.tools.javac.util.Position$LineMapImpl.build(Position.java:139)
at com.sun.tools.javac.util.Position.makeLineMap(Position.java:63)
at com.sun.tools.javac.parser.Scanner.getLineMap(Scanner.java:1105)
at com.sun.tools.javac.main.JavaCompiler.parse(JavaCompiler.java:512)
at com.sun.tools.javac.main.JavaCompiler.parse(JavaCompiler.java:550)
at com.sun.tools.javac.main.JavaCompiler.parseFiles(JavaCompiler.java:801)
at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:727)
at com.sun.tools.javac.main.Main.compile(Main.java:353)
at com.sun.tools.javac.main.Main.compile(Main.java:279)
at com.sun.tools.javac.main.Main.compile(Main.java:270)
at com.sun.tools.javac.Main.compile(Main.java:69)
at com.sun.tools.javac.Main.main(Main.java:54)
Not exactly. But this is a massive source file, so we'll need to increase the Java compiler's memory allowance:
$ javac Babel.java -J-Xms256m -J-Xmx256m Note: Babel.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details.
This seems to have worked. Can we run the classfile?
$ java Babel -H Open Babel converts chemical structures from one file format to another Usage: Babel <input spe> <output spec> [Options] Each spec can be a file whose extension decides the format. Optionally the format can be specified by preceding the file by -i<format-type> e.g. -icml, for input and -ofor output --truncated--
Success! But before we get too excited, let's make sure Open Babel's file formats are recognized by testing for "SMILES":
$ java Babel -Hsmi Format type: smi was not recognized
As you can see, we have successfully converted the babel program to an executable classfile, but this classfile is missing most of the features of the native binary.
This may seem hopeless, but consider that natively compiling Open Babel using the above configure flags also produces a binary that doesn't know about SMILES or any other format.
So, it's very likely that if we can produce a native, statically compiled, self contained babel executable, then we will have solved the problem of running Open Babel entirely on a JVM.
This doesn't seem like a difficult problem, but apparently it is.
Compiling Open Babel to Pure Java Bytecode with NestedVM: An Unsuccessful First Attempt 7
Wouldn't it be great to be able to compile code written in languages like FORTRAN, C, and C++ to Java bytecode? NestedVM - almost magically - can do just that. This article documents a failed first attempt to compile the popular cheminformatics toolkit Open Babel, which is written in C and C++, to pure Java bytecode with NestedVM.
A previous article described the successful compilation of the InChI toolkit, a C library, to a platform-independent executable jarfile.
The Problem
Open Babel is one of cheminformatics' most widely-used open source packages. It interconverts dozens of molecular languages, performs a host of cheminformatics analyses, and serves as a platform for many programs and Web services.
As useful as Open Babel is, it doesn't run directly on a Java Virtual Machine (JVM). Although an Open Babel JNI interface does exist, using it introduces a platform dependency, which in many cases is not acceptable. JNI is a great solution in some cases, but when maintaining a single version of a program is important, or when applets need to be used, or when code needs to work with unusual system configurations, it's a poor choice.
Our goal is to compile Open Babel's "babel" command-line utility into pure Java bytecode that can be run on any recent JVM without using JNI.
Overview of NestedVM
In a nutshell, NestedVM converts MIPS binaries to Java class files. In theory, this allows software written in any language that can be compiled to a MIPS binary to be run on a JVM.
To do this, NestedVM distributes two categories of tools: (1) a complete MIPS cross-compiler toolchain; and (2) a MIPS binary to Java bytecode compiler and accessories.
Building NestedVM
The preferred method to install NestedVM is to compile it from source found in the project repository. There are a number of prerequisites your system must meet in order to be able to do so. For now, this article assumes your system has all of them. Some of the following steps can be found in these instructions as well.
To obtain the source code from the NestedVM darcs repository:
$ darcs get --repo-name=nestedvm http://nestedvm.ibex.org
Then change into the nestedvm directory and build the main code:
$ cd nestedvm $ make
On my machine, this step takes 10-15 minutes.
To make sure your build works, run the tests:
$ make test ... 1.574000e+00 -4.315000e+01l -43 -4.315000e+01 4.315000e+01 Hello, World 7F fabs(-2.24) = 2.34 Destructor!
NestedVM doesn't build the g++ compiler by default - it's something that needs to be done manually. Fortunately, it's not difficult to do:
$ make cxxtest ... java -cp build tests.CXXTest Test's constructor Name: 0x50b40 Name: PKc Is pointer: 1 Name: 0x50b3c Name: i Is pointer: 0 Hello, World from Test Now throwing an exception sayhi threw: const char *:Hello, Exception Handling! Test's destructor
Finally, with all tools built, we need to set up our environment:
$ make env.sh $ source env.sh
We're now ready to cross-compile Open Babel.
Cross-Compiling Open Babel
For this tutorial, we'll use the Open Babel 2.1.1 source distribution. Unpack the tarball and change into the directory.
Next, we'll need to set up our cross-compiler environment. Fortunately, NestedVM has made this easy. If you check your environment variables, you'll find that CXX and CC have both been set. All that remains is to notify the configure script that we'll be cross-compiling:
$ ./configure --host=mips-unknown-elf
Then we build the MIPS binaries:
$ make
Peeking into the tools directory, we can see all of the Open Babel command line tools have been built, including babel.
Unless you're running a MIPS machine, though, this binary won't be executable.
So far, it looks like everything worked. Although it didn't work the first time I tried it, the NestedVM team were most helpful.
Building the Java Class File
We're now ready for the final stage in the process, converting the MIPS binary to a Java class file. Again, NestedVM makes this simple:
$ cd tools
$ java org.ibex.nestedvm.Compiler -outfile Babel.class Babel babel
Exception in thread "main" java.lang.IllegalStateException: unresolved phantom target
at org.ibex.classgen.MethodGen.resolveTarget(MethodGen.java:555)
at org.ibex.classgen.MethodGen._generateCode(MethodGen.java:664)
at org.ibex.classgen.MethodGen.generateCode(MethodGen.java:618)
at org.ibex.classgen.MethodGen.dump(MethodGen.java:888)
at org.ibex.classgen.ClassFile._dump(ClassFile.java:193)
at org.ibex.classgen.ClassFile.dump(ClassFile.java:160)
at org.ibex.nestedvm.ClassFileCompiler.__go(ClassFileCompiler.java:380)
at org.ibex.nestedvm.ClassFileCompiler._go(ClassFileCompiler.java:72)
at org.ibex.nestedvm.Compiler.go(Compiler.java:259)
Unfortunately, NestedVM has blown up with an exception. Although our target class file, Babel.class is now in our working directory, it is not complete and won't run.
What Went Wrong?
After bringing this problem to the NestedVM mailing list, it appears that this is a NestedVM bug.
However, the way babel works is to load its various language modules dynamically. It may be possible to fix the problem by producing a version of babel containing all of its modules in a single binary.
Although there is a major issue to be resolved, this tutorial illustrates the full process of compiling C++ code to Java bytecode using NestedVM.
JInChI: Run InChI Anywhere Java Runs 5
Regardless of your views on Java the Programming Language, Java the Platform has a lot going for it. The ability to run the same executable on any system with a Java Virtual Machine (JVM), without recompilation, is a significant advantage in today's heterogeneous computing environment. Combine that with Java the Platform's battle-tested security model, stability and performance, and you have some compelling reasons to actually prefer that code execute on a JVM rather than bare metal.
Cheminformatics has many useful libraries, legacy and otherwise, that don't yet run on a JVM. Many of these can trace their roots back to the 1960s and 1970s and FORTRAN; others were written in C or C++ more recently. What they all have in common is that they're compiled to native binaries rather than Java bytecode.
Wouldn't it be great if this software could be easily compiled to Java bytecode instead?
A recent Depth-First article described how the InChI toolkit, an open source C library distributed by IUPAC, was successfully compiled to a Java classfile with the remarkable NestedVM library. This article describes the creation and use of a new platform-independent jarfile that runs the InChI program.
The procedure was not difficult. The two files previously released ( JInChI.class and nestedvm.jar) were combined into a single executable jarfile with a Manifest pointing to the JInChI classfile as the Main class.
The full cInChI jarfile can be downloaded here.
The jinchi.jar file can be tested from the command line:
$ java -jar jinchi.jar InChI ver 1, Software version 1.01 release 07/21/2006. Usage: cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]] Options: SNon Exclude stereo (Default: Include Absolute stereo) SRel Relative stereo SRac Racemic stereo [truncated]
If we wanted to process a molfile representing toluene, we'd use something like the following:
$ java -jar jinchi.jar test/toluene.mol InChI version 1, Software version 1.01 release 07/21/2006 Opened log file 'test/toluene.mol.log' Opened input file 'test/toluene.mol' Opened output file 'test/toluene.mol.txt' Opened problem file 'test/toluene.mol.prb' Options: Mobile H Perception ON Isotopic ON, Absolute Stereo ON Omit undefined/unknown stereogenic centers and bonds Full Aux. info Input format: MOLfile Output format: Plain text Timeout per structure: 60.000 sec; Up to 1024 atoms per structure End of file detected after structure #1. Finished processing 1 structure: 0 errors, processing time 0:00:00.00
This command would produce the following output file, just like the cInChI program:
$ cat test/toluene.mol.txt * Input_File: "test/toluene.mol" Structure: 1 InChI=1/C7H8/c1-7-5-3-2-4-6-7/h2-6H,1H3 AuxInfo=1/0/N:1,2,3,4,5,6,7/E:(3,4)(5,6)/rA:7nCCCCCCC/rB:;d2;s2;s3;d4;s1d5s6;/rC:3.6373,2.8,0;0,.7,0;0,2.1,0;1.2124,0,0;1.2124,2.8,0;2.4249,.7,0;2.4249,2.1,0;
We can also convert InChIs into molfiles (command line options work the same as in cInChI):
$ java -jar jinchi.jar test/toluene.mol.txt -OutputSDF InChI version 1, Software version 1.01 release 07/21/2006 Opened log file 'test/toluene.mol.txt.log' Opened input file 'test/toluene.mol.txt' Opened output file 'test/toluene.mol.txt.txt' Opened problem file 'test/toluene.mol.txt.prb' Options: Output SDfile only End of file detected after structure #1. Finished processing 1 structure: 0 errors, processing time 0:00:00.00
In this case the output is:
$ cat test/toluene.mol.txt.txt
Structure #1
InChI v1 SDfile Output
7 7 0 0 0 0 0 0 0 0 1 V2000
3.6373 2.8000 0.0000 C 0 0 0 0 0 0 0 0 0
0.0000 0.7000 0.0000 C 0 0 0 0 0 0 0 0 0
0.0000 2.1000 0.0000 C 0 0 0 0 0 0 0 0 0
1.2124 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0
1.2124 2.8000 0.0000 C 0 0 0 0 0 0 0 0 0
2.4249 0.7000 0.0000 C 0 0 0 0 0 0 0 0 0
2.4249 2.1000 0.0000 C 0 0 0 0 0 0 0 0 0
1 7 1 0 0 0 0
2 3 2 0 0 0 0
2 4 1 0 0 0 0
3 5 1 0 0 0 0
4 6 2 0 0 0 0
5 7 2 0 0 0 0
6 7 1 0 0 0 0
M END
$$$$
Similar tests worked on both Linux and Windows using the same jarfile.
There are still some issues to be addressed with this approach. For example, various reports indicate that NestedVM code runs about four to ten times slower than native execution. Benchmarking may be useful at this point.
Another issue is how to go about making a Java InChI library with NestedVM. If you decompile the jinchi.jar file, you'll find that the JInChI.class file is a large and complex beast in which almost all methods are named as hex numbers. It may be possible to create a library by renaming certain methods and breaking the code into smaller classfiles, but the NestedVM documentation seems sparse on this subject.
Despite these difficulties, this article demonstrates the power of NestedVM and describes the first (and currently only) example of a 100% Java InChI implementation.
Image Credit: smithco
Older posts: 1 2

