JInChI: Run InChI Anywhere Java Runs 5
Regardless of your views on Java the Programming Language, Java the Platform has a lot going for it. The ability to run the same executable on any system with a Java Virtual Machine (JVM), without recompilation, is a significant advantage in today's heterogeneous computing environment. Combine that with Java the Platform's battle-tested security model, stability and performance, and you have some compelling reasons to actually prefer that code execute on a JVM rather than bare metal.
Cheminformatics has many useful libraries, legacy and otherwise, that don't yet run on a JVM. Many of these can trace their roots back to the 1960s and 1970s and FORTRAN; others were written in C or C++ more recently. What they all have in common is that they're compiled to native binaries rather than Java bytecode.
Wouldn't it be great if this software could be easily compiled to Java bytecode instead?
A recent Depth-First article described how the InChI toolkit, an open source C library distributed by IUPAC, was successfully compiled to a Java classfile with the remarkable NestedVM library. This article describes the creation and use of a new platform-independent jarfile that runs the InChI program.
The procedure was not difficult. The two files previously released ( JInChI.class and nestedvm.jar) were combined into a single executable jarfile with a Manifest pointing to the JInChI classfile as the Main class.
The full cInChI jarfile can be downloaded here.
The jinchi.jar file can be tested from the command line:
$ java -jar jinchi.jar InChI ver 1, Software version 1.01 release 07/21/2006. Usage: cInChI-1 inputFile [outputFile [logFile [problemFile]]] [-option[ -option...]] Options: SNon Exclude stereo (Default: Include Absolute stereo) SRel Relative stereo SRac Racemic stereo [truncated]
If we wanted to process a molfile representing toluene, we'd use something like the following:
$ java -jar jinchi.jar test/toluene.mol InChI version 1, Software version 1.01 release 07/21/2006 Opened log file 'test/toluene.mol.log' Opened input file 'test/toluene.mol' Opened output file 'test/toluene.mol.txt' Opened problem file 'test/toluene.mol.prb' Options: Mobile H Perception ON Isotopic ON, Absolute Stereo ON Omit undefined/unknown stereogenic centers and bonds Full Aux. info Input format: MOLfile Output format: Plain text Timeout per structure: 60.000 sec; Up to 1024 atoms per structure End of file detected after structure #1. Finished processing 1 structure: 0 errors, processing time 0:00:00.00
This command would produce the following output file, just like the cInChI program:
$ cat test/toluene.mol.txt * Input_File: "test/toluene.mol" Structure: 1 InChI=1/C7H8/c1-7-5-3-2-4-6-7/h2-6H,1H3 AuxInfo=1/0/N:1,2,3,4,5,6,7/E:(3,4)(5,6)/rA:7nCCCCCCC/rB:;d2;s2;s3;d4;s1d5s6;/rC:3.6373,2.8,0;0,.7,0;0,2.1,0;1.2124,0,0;1.2124,2.8,0;2.4249,.7,0;2.4249,2.1,0;
We can also convert InChIs into molfiles (command line options work the same as in cInChI):
$ java -jar jinchi.jar test/toluene.mol.txt -OutputSDF InChI version 1, Software version 1.01 release 07/21/2006 Opened log file 'test/toluene.mol.txt.log' Opened input file 'test/toluene.mol.txt' Opened output file 'test/toluene.mol.txt.txt' Opened problem file 'test/toluene.mol.txt.prb' Options: Output SDfile only End of file detected after structure #1. Finished processing 1 structure: 0 errors, processing time 0:00:00.00
In this case the output is:
$ cat test/toluene.mol.txt.txt
Structure #1
InChI v1 SDfile Output
7 7 0 0 0 0 0 0 0 0 1 V2000
3.6373 2.8000 0.0000 C 0 0 0 0 0 0 0 0 0
0.0000 0.7000 0.0000 C 0 0 0 0 0 0 0 0 0
0.0000 2.1000 0.0000 C 0 0 0 0 0 0 0 0 0
1.2124 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0
1.2124 2.8000 0.0000 C 0 0 0 0 0 0 0 0 0
2.4249 0.7000 0.0000 C 0 0 0 0 0 0 0 0 0
2.4249 2.1000 0.0000 C 0 0 0 0 0 0 0 0 0
1 7 1 0 0 0 0
2 3 2 0 0 0 0
2 4 1 0 0 0 0
3 5 1 0 0 0 0
4 6 2 0 0 0 0
5 7 2 0 0 0 0
6 7 1 0 0 0 0
M END
$$$$
Similar tests worked on both Linux and Windows using the same jarfile.
There are still some issues to be addressed with this approach. For example, various reports indicate that NestedVM code runs about four to ten times slower than native execution. Benchmarking may be useful at this point.
Another issue is how to go about making a Java InChI library with NestedVM. If you decompile the jinchi.jar file, you'll find that the JInChI.class file is a large and complex beast in which almost all methods are named as hex numbers. It may be possible to create a library by renaming certain methods and breaking the code into smaller classfiles, but the NestedVM documentation seems sparse on this subject.
Despite these difficulties, this article demonstrates the power of NestedVM and describes the first (and currently only) example of a 100% Java InChI implementation.
Image Credit: smithco
Compiling the InChI Toolkit to Pure Java Bytecode with NestedVM 6
Some time ago, a Depth-First article discussed some methods for compiling C to Java bytecode. Many factors make this approach attractive compared to the JNI approach. Some of them include security, portability, and use within applets. Unfortunately, none of the approaches discussed in the earlier article seemed particularly general.
Many cheminformatics libraries are written in C and C++; being able to reliably and automatically port them to Java could potentially save a great deal of effort.
One of the more important cheminformatics C libraries written in recent years is the InChI toolkit. With no pure Java port of this library, JNI is the only way to use InChI with Java. In some situations, this approach is either overly complicated or simply unacceptable.
All of this leaves us with the question: how can the InChI toolkit be converted into a pure Java library without writing any new code?
A partial answer to this question came from Evan Jones, who suggested I look at NestedVM. From the website:
NestedVM provides binary translation for Java Bytecode. This is done by having GCC compile to a MIPS binary which is then translated to a Java class file. Hence any application written in C, C++, Fortran, or any other language supported by GCC can be run in 100% pure Java with no source changes.
And it worked!
NestedVM was successfully used compile the InChI-API distribution to a Java class file that executed on nothing more than a standard JVM -- and with no JNI code. The InChI classfile and nestedvm runtime jarfile can be downloaded from SourceForge. Future articles in this series will describe the compilation, installation, and use of NestedVM, as well as the Java class file that it produced.
Image Credit: pmorgan
Building Rubidium: Creating a RubyForge Project Space
Recent articles have discussed Rubidium, the cheminformatics toolkit for Ruby. In this article, the first in a series, I'll go beyond the Ruby code to discuss the technical aspects of taking an Open Source idea from concept to release.
Finding a Home
Before setting up your Open Source project, you'll need to decide on how to host it. Project hosting can be as simple or elaborate as you wish, but the basic services include: a website; a mailing list; a discussion forum; a source code repository (typically CVS or Subversion); a bug tracking system; and a file release system.
The multitude of choices can be broken down into two basic options: host the project yourself or use a free hosting service. Fortunately, Ruby-based projects enjoy two excellent free hosting options: SourceForge and RubyForge. Although SourceForge could certainly be used for a Ruby project, RubyForge is a more popular option. One of the reasons is that any RubyGem your project releases automatically becomes installable through the RubyGems package management system with a simple one-line incantation:
$ sudo gem install <yourprojectname>
Another reason to use RubyForge is discoverability. RubyForge only hosts projects related in some way to Ruby. So, your project will stand out a lot more in its category than with a much larger site like SourceForge.
Given RubyForge's advantages, and my own interest in minimizing the work needed to maintain an Open Source project, Rubidium will be hosted on RubyForge.
Requesting a Project Space
Having decided on RubyForge as Rubidium's host, all that's left is to ask for free services. You'll need to register for a user account if you haven't done so already. Then, simply apply for project space. After about three business days, you should be notified whether your project was accepted.
Several days ago, I completed this process for Rubidium. Its new home on RubyForge will be:
http://rubyforge.org/projects/rbtk
The Rubidium home page can be found at:
There's nothing useful there yet, a situation that will hopefully be fixed in a few weeks.
Next Steps
With powerful free services now available for the Rubidium project, we'll want to start taking advantage of them. The next articles in this series will discuss some ways of doing so.



