Taking a SWIG of InChI
The IUPAC InChI developer toolkit is written in C. It is currently the only Open Source software capable of generating InChI identifiers. Software that needs to write InChIs must use the C toolkit in one form or another. This poses a problem for the large amount of chemical informatics software being written in other languages. In this article, I'll explain how the Open Source tool SWIG can solve this problem in a semi-automated way. The same concepts can, in principle, be used to link any library written in C/C++ with another language.
Prerequisites
This tutorial uses Ruby as the language that InChI will be linked with. You'll therefore need both Ruby and the Ruby development libraries installed. You'll also need SWIG and possibly the SWIG development libraries.
Use the Source, Luke
After downloading and unpacking InChI-1-API v1.0.1, collect all header (*.h) and source (*.c) files into a directory called inchi. These files can be found in the following two directories:
- InChI-1-API/cInChI/common
- InChI-1-API/cInChI/main
Find the Main Method
This tutorial will create an interface into the InChI main() function. This function is found on line 149 of the file ichimain.c. For reasons I won't get into here, rename this method run and change the second argument type to char **. Also, add a prototype for the run function directly above line 149:
int run( int argc, char **argv ); // new line added
int run( int argc, char **argv ) // formerly line 149Create the Interface File
The focal point of SWIG is the interface file. This file specifies the C functions you want to link into and some items to help in doing so. Create a file called libinchi.i containing the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
The interface file has three main parts. The first part (line 2) names the module. The second part (lines 7-30) makes the necessary Ruby/C datatype conversions. The last part (line 35) tells SWIG the InChI functions we want to be able to access from Ruby.
Take a SWIG
At this point, SWIG has everything it needs to autogenerate our glue code. This can be done by:
$ swig -ruby libinchi.i
This command should have created a new source file, libinchi_wrap.c, that contains all of the C glue code for our library. We'll have a look at the most important part of this file shortly.
Create a Makefile
We'll need a makefile with which to compile our library. Fortunately, Ruby makes this very easy. Create a file called extconf.rb containing the following Ruby code:
require 'mkmf'
create_makefile('libinchi')$ ruby extconf.rb
Build the Library
Our library can now be built with:
$ make
Use InChI from Ruby
We are now done with the basics. You can verify that the process worked through Interactive Ruby (irb):
$ irb irb(main):001:0> require 'libinchi' => true
The return value of true shows that Ruby loaded and recognized the binary library we just built (libinchi.so). We are now able to use this library as if it were written in Ruby.
Use the Library
To test the library, copy a molfile called test.mol into your inchi directory. Now run this code:
require 'libinchi'
Libinchi.run(['', 'test.mol'])You should get a lot of output from the InChI libary. If you take a look at the inchi directory contents, a new file, test.mol.txt, has been created. It contains the InChI identifier of the molecule contained in your molfile. This software also created a log file (test.mol.log) and a problem file (test.mol.prb).
You may be wondering why the first element in the Array passed to Libinchi.run is empty. The reason is that by convention a C main method expects its first argument to be the name of the program itself. The InChI main method takes this into account, and so the Array simply leaves its first element blank.
Customize the Library
Have a look at the libinchi_wrap.c file that SWIG created. At the bottom of this file should be a function called Init_libinchi:
SWIGEXPORT(void) Init_libinchi(void) {
int i;
SWIG_InitRuntime();
mLibinchi = rb_define_module("Libinchi");
for (i = 0; swig_types_initial[i]; i++) {
swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
SWIG_define_class(swig_types[i]);
}
rb_define_module_function(mLibinchi, "run", _wrap_run, -1);
}This is what Ruby uses to map C functions to Ruby modules, classes, and methods. In this case, the C run method is being mapped to a module called Libinchi which has a run method.
Let's say that you'd prefer a module name of InChI with a method called write_inchi. The following changes to Init_libinchi will accomplish this:
SWIGEXPORT(void) Init_libinchi(void) {
int i;
SWIG_InitRuntime();
mLibinchi = rb_define_module("InChI");
for (i = 0; swig_types_initial[i]; i++) {
swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
SWIG_define_class(swig_types[i]);
}
rb_define_module_function(mLibinchi, "write_inchi", _wrap_run, -1);
}Run make again. Now the following can be used to write the InChI information for test.mol:
require 'libinchi'
InChI.write_inchi(['', 'test.mol'])Summing Up
SWIG simplifies the job of connecting high-level languages like Ruby to C/C++ libraries. Although not illustrated in the simple example above, SWIG offers several advanced tools for creating rich library interfaces. Given the large amount of chemical informatics software written in C/C++, and the increasing interest by developers in scripting languages such as Ruby, the SWIG approach is likely to be broadly useful in several areas of chemical informatics integration.
The C InChI toolkit appears in a few other Open Source projects including Open Babel, the Chemistry Development Kit via the JNI InChI Wrapper, and Rino. To my knowledge, none use SWIG. This will soon change as the approach described here becomes incorporated into Rino.
On a more general note, the availability of the InChI source code under an Open Source license is essential to developing and distributing the kind of integration library discussed here. We can only hope that others working in chemical informatics see the wisdom in a system that creates healthy software ecosystems wherever it takes hold.
From SMILES to InChI: Rino, CDK, and Ruby Java Bridge
Integrating Ruby and Java is fast and easy with Ruby Java Bridge (RJB), which was discussed previously. In this article, I'll show how RJB can be used to solve a practical chemical informatics problem - the conversion of SMILES strings into InChI identifiers.
Prerequisites
This tutorial is aimed at Linux users, but you should be able to accomplish the same thing in Windows and Mac OS X, although these systems have not been tested. You'll need to install a few software packages if you haven't done so already: Ruby; Ruby Gems; RJB; CDK; and Rino. After installing RubyGems, RJB and Rino can both be installed from the command line (as root):
# gem install rjb # gem install rino
Next, create a working directory, smi2inchi. Into this directory, move a copy of the full CDK-2006714 jarfile. That's it for libraries, so let's move onto the translator itself.
The Translator
The Translator class consists of a small piece of Ruby code gluing CDK's SmilesParser and MDLWriter with the Ruby InChI library Rino. Rino is a thin Ruby wrapper around the IUPAC InChI library, which is in turn written in C.
ENV['CLASSPATH'] = './cdk-20060714.jar'
require 'rubygems'
require_gem 'rjb'
require_gem 'rino'
require 'rjb'
StringWriter = Rjb::import 'java.io.StringWriter'
SmilesParser = Rjb::import 'org.openscience.cdk.smiles.SmilesParser'
MDLWriter = Rjb::import 'org.openscience.cdk.io.MDLWriter'
# Converts a SMILES string into an InChI identifier using
# the CDK Library (Java) and the Rino Library (Ruby/C).
class Translator
def initialize
@smiles_parser = SmilesParser.new
@mdl_writer = MDLWriter.new
@mol2inchi = Rino::MolfileReader.new
end
# Returns an InChI identifier from the specified SMILES string.
# Uses the CDK classes SmilesParser and MDLWriter to generate
# a molfile from a SMILES string. Then this molfile is
# parsed by Rino::MolfileReader.
def translate(smiles)
mol = @smiles_parser.parseSmiles(smiles)
sw = StringWriter.new
@mdl_writer.setWriter(sw)
@mdl_writer.write(mol)
@mol2inchi.read(sw.toString)
end
end
require 'smi2inchi'
translator = Translator.new
inchi = translator.translate 'c1ccccc1'
p inchi # => "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"$ ruby test.rb
Alternatively, it can be entered interactively with the Interactive Ruby Interpreter (irb):
$ irb irb(main):001:0>
With just a few lines of Ruby, we've solved a real problem. This example integrates software from three different programming languages: Ruby, C, and Java. Given the variety of chemical informatics software written in these languages, Ruby Java Bridge offers numerous integration possibilities.
Scripting Java Libraries with Ruby Java Bridge
Although JRuby solves many Java/Ruby integration issues, in some cases it's not the right solution. One situation is when you want your Ruby code to use extensions written in C. The JRuby documentation makes very clear that this will never be supported. Another situation is if your code needs full access to Ruby on Rails, or if your hosting service makes it difficult to configure JRuby on Rails. In these cases, JRuby's currently limited Rails support makes it a suboptimal choice.
Ruby Java Bridge (RJB) is designed to solve these problems by letting Ruby developers manipulate Java libraries from Ruby. This gives you the ability to access C Ruby extensions and Java libraries in the same Ruby program. It also makes Rails integration a snap. Articles to follow will explore these two points. For now, let's see how how to get RJB working.
Installing Ruby Java Bridge is very simple. With root access:
gem install rjb
This installs the Ruby Java Bridge gem. That's all there is to it.
Instantiating and using Java classes consists of the familiar process of first importing the class followed by creating a new instance:
require 'rubygems'
require_gem 'rjb'
require 'rjb'
string_class = Rjb::import 'java.lang.String'
hello_string = string_class.new_with_sig('Ljava.lang.String;', 'hello')
p hello_string.toString # -> "hello"Because an argument is passed to the constructor of the Java class, a special form needs to be used, new_with_sig. The "L" in front of the import statement indicates that the argument "hello" is a non-primitive datatype (i.e. class or interface).
Ruby Java Bridge offers some important advantages over JRuby. Subsequent articles will explore how these advantages can be used to quickly develop applications integrating chemical informatics libraries written in multiple languages.
Scripting CDK with JRuby
A previous article discussed the use of the Java chemical informatics library Octet from JRuby. This article will discuss the use of another Java chemical infomatics library, CDK, from JRuby. A small Ruby class will be developed that generates a molfile with completely assigned 2-D coordinates from a SMILES string.

