Open Babel 2.2.0

Posted by Rich Apodaca Fri, 04 Jul 2008 15:29:00 GMT

Open Babel 2.2.0 has been released. This version introduces a variety of new features and improvements. It also includes the Ruby Open Babel interface that allows scripting through the popular Ruby language; Ruby Open Babel can be installed both quickly and easily. Further details are available from the release notes.

Future articles will highlight some of the new Open Babel features using Ruby.

Run Babel Anywhere Java Runs with JBabel 6

Posted by Rich Apodaca Mon, 10 Dec 2007 13:50:00 GMT

A recent series of D-F articles have discussed the use of NestedVM to compile cheminformatics programs written in C/C++ to pure java binaries that can be run on any system with a JVM. More specifically, an attempt to compile OpenBabel's babel program to bytecode was only partially successful. With the help of Geoff Hutchison, the problem was resolved. This article introduces JBabel, a platform-independent, pure Java implementation of OpenBabel's babel program.

A Little About JBabel

JBabel was compiled from the Open Babel 2.1.1 source release and can be downloaded from SourceForge. The same jarfile was successfully tested on Linux, Windows and Mac OS X. You can verify JBabel works on your platform with the following command:

$ java -jar jbabel-20071209.jar -Hsmi
smi  SMILES format
A linear text format which can describe the connectivity
and chirality of a molecule
Write Options e.g. -xt
  n no molecule name
  t molecule name only
  r radicals lower case eg ethyl is Cc

This version of JBabel was compiled with support for three formats:

  • SMILES (smi). Non-canonical SMILES.

  • MDL (mol). Molfiles and SD Files.

  • Canonical SMILES (can). Canonical SMILES implementation donated by eMolecules.

I'll discuss exactly how support for these formats was added in a subsequent post. More formats will be added in the future. For now, let's just try JBabel out.

Testing JBabel

One way to use JBabel is interactively from the command line - just leave out an input or output file parameter. For example, if you wanted to get the eMolecules canonical SMILES for sertraline, you might do something like this (be sure to use two returns to begin processing):

$ java -jar jbabel-20071209.jar -ismi -ocan
CN[C@H]1CC[C@H](C2=CC=CC=C12)C3=CC(=C(C=C3)Cl)Cl

CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12
1 molecule converted
34 audit log messages

This canonical SMILES can be converted into a molfile with the following:

$ java -jar jbabel-20071209.jar -ismi -omol
CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12


 OpenBabel12090723182D

 22 24  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0

...

To convert using input and output files, we could use a medium-sized dataset such as the PubChem benzodiazepine dataset prepared for Rubidium:

$ java -jar jbabel-20071209.jar -imol pubchem_benzodiazepine_20071110.sdf -ocan pubchem_benzodiazepine_20071110.smi
==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problems reading a MDL file
Cannot read title line

2117 molecules converted

This test, which parses 2117 records, required four minutes forty-five seconds on my system. For comparison, the natively compiled binary did the same thing in about thirteen seconds. Clearly, the JBabel performance hit is substantial.

Uses

Although it's very unlikely that JBabel will ever be useful in performance-critical situations, its portability makes it attractive for other uses. Examples include:

  • application development in heterogeneous computing environments;

  • use on systems in which native compilation may be difficult, such as those with unusual configurations or operating systems;

  • cases in which native binaries work poorly or not at all, such as in applets and Java applications;

  • situations in which performance is a minor consideration, such as in end-user applications that process only a few molecules at a time, or during application prototyping

Conclusions

This article has described JBabel, the first portable binary version of OpenBabel's babel molecular file format interconversion program. The next article in this series will describe in detail the steps that were used to compile it.

Compiling Open Babel to Pure Java Bytecode with NestedVM: Building A Runnable Classfile that Almost Works 2

Posted by Rich Apodaca Mon, 26 Nov 2007 15:10:00 GMT

Previously, I described an unsuccessful first attempt to compile the popular cheminformatics C/C++ library Open Babel to pure Java bytecode using NestedVM. This article follows that topic one step further, and shows how to obtain a runnable Java classfile. Although major functionality is missing, the principle of compiling arbitrary C/C++ code to both Java source code and Java bytecode is illustrated.

Getting Started

This articles assumes that you've installed NestedVM and downloaded Open Babel on your system. You'll then need to set up your environment (from the nestedvm installation directory):

$ source env.sh

Run the Configure Script

The configure script we used last time didn't attempt to statically compile the binary utilities in the tools directory. This time, we'll add flags to allow this:

$ ./configure --disable-dynamic-modules --enable-static=yes --enable-shared=no --enable-inchi --host=mips-unknown-elf
$ make

Note: leaving out the static compile directives does not produce a fully-functioning classfile either.

Next, we'll attempt to directly create the babel binary in Java classfile format, as we did last time:

$ cd tools
$ java org.ibex.nestedvm.Compiler -outfile Babel.class Babel babel
Exception in thread "main" java.lang.IllegalStateException: unresolved phantom target
        at org.ibex.classgen.MethodGen.resolveTarget(MethodGen.java:555)
        at org.ibex.classgen.MethodGen._generateCode(MethodGen.java:664)
        at org.ibex.classgen.MethodGen.generateCode(MethodGen.java:618)
        at org.ibex.classgen.MethodGen.dump(MethodGen.java:888)
        at org.ibex.classgen.ClassFile._dump(ClassFile.java:193)
        at org.ibex.classgen.ClassFile.dump(ClassFile.java:160)
        at org.ibex.nestedvm.ClassFileCompiler.__go(ClassFileCompiler.java:380)
        at org.ibex.nestedvm.ClassFileCompiler._go(ClassFileCompiler.java:72)
        at org.ibex.nestedvm.Compiler.go(Compiler.java:259)
        at org.ibex.nestedvm.Compiler.main(Compiler.java:183)

We're getting the same error as before. Although, an announcement of a bugfix was posted to the NestedVM list, in my hands the new version of NestedVM caused the same error.

As a workaround, we can compile to Java sourcecode first:

$ java org.ibex.nestedvm.Compiler -outformat java -outfile Babel.java Babel babel

We now have a Java source file encoding the babel program. Does it compile?

$ javac Babel.java
The system is out of resources.
Consult the following stack trace for details.
java.lang.OutOfMemoryError: Java heap space
        at com.sun.tools.javac.util.Position$LineMapImpl.build(Position.java:139)
        at com.sun.tools.javac.util.Position.makeLineMap(Position.java:63)
        at com.sun.tools.javac.parser.Scanner.getLineMap(Scanner.java:1105)
        at com.sun.tools.javac.main.JavaCompiler.parse(JavaCompiler.java:512)
        at com.sun.tools.javac.main.JavaCompiler.parse(JavaCompiler.java:550)
        at com.sun.tools.javac.main.JavaCompiler.parseFiles(JavaCompiler.java:801)
        at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:727)
        at com.sun.tools.javac.main.Main.compile(Main.java:353)
        at com.sun.tools.javac.main.Main.compile(Main.java:279)
        at com.sun.tools.javac.main.Main.compile(Main.java:270)
        at com.sun.tools.javac.Main.compile(Main.java:69)
        at com.sun.tools.javac.Main.main(Main.java:54)

Not exactly. But this is a massive source file, so we'll need to increase the Java compiler's memory allowance:

$ javac Babel.java -J-Xms256m -J-Xmx256m
Note: Babel.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

This seems to have worked. Can we run the classfile?

$ java Babel -H
Open Babel converts chemical structures from one file format to another

Usage: Babel <input spe> <output spec> [Options]

Each spec can be a file whose extension decides the format.
Optionally the format can be specified by preceding the file by
-i<format-type> e.g. -icml, for input and -o for output

--truncated--

Success! But before we get too excited, let's make sure Open Babel's file formats are recognized by testing for "SMILES":

$ java Babel -Hsmi
Format type: smi was not recognized

As you can see, we have successfully converted the babel program to an executable classfile, but this classfile is missing most of the features of the native binary.

This may seem hopeless, but consider that natively compiling Open Babel using the above configure flags also produces a binary that doesn't know about SMILES or any other format.

So, it's very likely that if we can produce a native, statically compiled, self contained babel executable, then we will have solved the problem of running Open Babel entirely on a JVM.

This doesn't seem like a difficult problem, but apparently it is.

Compiling Open Babel to Pure Java Bytecode with NestedVM: An Unsuccessful First Attempt 7

Posted by Rich Apodaca Mon, 19 Nov 2007 15:42:00 GMT

Wouldn't it be great to be able to compile code written in languages like FORTRAN, C, and C++ to Java bytecode? NestedVM - almost magically - can do just that. This article documents a failed first attempt to compile the popular cheminformatics toolkit Open Babel, which is written in C and C++, to pure Java bytecode with NestedVM.

A previous article described the successful compilation of the InChI toolkit, a C library, to a platform-independent executable jarfile.

The Problem

Open Babel is one of cheminformatics' most widely-used open source packages. It interconverts dozens of molecular languages, performs a host of cheminformatics analyses, and serves as a platform for many programs and Web services.

As useful as Open Babel is, it doesn't run directly on a Java Virtual Machine (JVM). Although an Open Babel JNI interface does exist, using it introduces a platform dependency, which in many cases is not acceptable. JNI is a great solution in some cases, but when maintaining a single version of a program is important, or when applets need to be used, or when code needs to work with unusual system configurations, it's a poor choice.

Our goal is to compile Open Babel's "babel" command-line utility into pure Java bytecode that can be run on any recent JVM without using JNI.

Overview of NestedVM

In a nutshell, NestedVM converts MIPS binaries to Java class files. In theory, this allows software written in any language that can be compiled to a MIPS binary to be run on a JVM.

To do this, NestedVM distributes two categories of tools: (1) a complete MIPS cross-compiler toolchain; and (2) a MIPS binary to Java bytecode compiler and accessories.

Building NestedVM

The preferred method to install NestedVM is to compile it from source found in the project repository. There are a number of prerequisites your system must meet in order to be able to do so. For now, this article assumes your system has all of them. Some of the following steps can be found in these instructions as well.

To obtain the source code from the NestedVM darcs repository:

$ darcs get --repo-name=nestedvm http://nestedvm.ibex.org

Then change into the nestedvm directory and build the main code:

$ cd nestedvm
$ make

On my machine, this step takes 10-15 minutes.

To make sure your build works, run the tests:

$ make test
...
1.574000e+00
-4.315000e+01l
-43
-4.315000e+01
4.315000e+01
Hello, World
7F
fabs(-2.24) = 2.34
Destructor!

NestedVM doesn't build the g++ compiler by default - it's something that needs to be done manually. Fortunately, it's not difficult to do:

$ make cxxtest
...
java -cp build tests.CXXTest
Test's constructor
Name: 0x50b40
Name: PKc
Is pointer: 1
Name: 0x50b3c
Name: i
Is pointer: 0
Hello, World from Test
Now throwing an exception
sayhi threw: const char *:Hello, Exception Handling!
Test's destructor

Finally, with all tools built, we need to set up our environment:

$ make env.sh
$ source env.sh

We're now ready to cross-compile Open Babel.

Cross-Compiling Open Babel

For this tutorial, we'll use the Open Babel 2.1.1 source distribution. Unpack the tarball and change into the directory.

Next, we'll need to set up our cross-compiler environment. Fortunately, NestedVM has made this easy. If you check your environment variables, you'll find that CXX and CC have both been set. All that remains is to notify the configure script that we'll be cross-compiling:

$ ./configure --host=mips-unknown-elf

Then we build the MIPS binaries:

$ make

Peeking into the tools directory, we can see all of the Open Babel command line tools have been built, including babel.

Unless you're running a MIPS machine, though, this binary won't be executable.

So far, it looks like everything worked. Although it didn't work the first time I tried it, the NestedVM team were most helpful.

Building the Java Class File

We're now ready for the final stage in the process, converting the MIPS binary to a Java class file. Again, NestedVM makes this simple:

$ cd tools
$ java org.ibex.nestedvm.Compiler -outfile Babel.class Babel babel
Exception in thread "main" java.lang.IllegalStateException: unresolved phantom target
        at org.ibex.classgen.MethodGen.resolveTarget(MethodGen.java:555)
        at org.ibex.classgen.MethodGen._generateCode(MethodGen.java:664)
        at org.ibex.classgen.MethodGen.generateCode(MethodGen.java:618)
        at org.ibex.classgen.MethodGen.dump(MethodGen.java:888)
        at org.ibex.classgen.ClassFile._dump(ClassFile.java:193)
        at org.ibex.classgen.ClassFile.dump(ClassFile.java:160)
        at org.ibex.nestedvm.ClassFileCompiler.__go(ClassFileCompiler.java:380)
        at org.ibex.nestedvm.ClassFileCompiler._go(ClassFileCompiler.java:72)
        at org.ibex.nestedvm.Compiler.go(Compiler.java:259)

Unfortunately, NestedVM has blown up with an exception. Although our target class file, Babel.class is now in our working directory, it is not complete and won't run.

What Went Wrong?

After bringing this problem to the NestedVM mailing list, it appears that this is a NestedVM bug.

However, the way babel works is to load its various language modules dynamically. It may be possible to fix the problem by producing a version of babel containing all of its modules in a single binary.

Although there is a major issue to be resolved, this tutorial illustrates the full process of compiling C++ code to Java bytecode using NestedVM.

Roll Your Own Chemical Database With Free Components 5

Posted by Rich Apodaca Fri, 13 Apr 2007 14:27:00 GMT

Are you thinking of building a free chemical database but would rather not rent and maintain a bunch of proprietary software components? Norbert Haider has thought a lot about this problem and offers some helpful resources to get you started:

Haider's system can be deployed on commodity hardware running open source operating systems. In other words, the cost of setting up a system like the one he describes is practically zero.

Creating and open sourcing your own custom components is one way to go. Building on top of existing open source tools like CDK, Open Babel, Octet and JOELib is another.

Haider's work raises an interesting question. Has anyone assembled a complete, ready to install general purpose chemical database package built from open source components? It for no other reason, such an exercise would give an excellent idea of what the dogfood tastes like.

Older posts: 1 2