Taking a SWIG of InChI
The IUPAC InChI developer toolkit is written in C. It is currently the only Open Source software capable of generating InChI identifiers. Software that needs to write InChIs must use the C toolkit in one form or another. This poses a problem for the large amount of chemical informatics software being written in other languages. In this article, I'll explain how the Open Source tool SWIG can solve this problem in a semi-automated way. The same concepts can, in principle, be used to link any library written in C/C++ with another language.
Prerequisites
This tutorial uses Ruby as the language that InChI will be linked with. You'll therefore need both Ruby and the Ruby development libraries installed. You'll also need SWIG and possibly the SWIG development libraries.
Use the Source, Luke
After downloading and unpacking InChI-1-API v1.0.1, collect all header (.h) and source (.c) files into a directory called inchi
. These files can be found in the following two directories:
InChI-1-API/cInChI/common
InChI-1-API/cInChI/main
Find the Main Method
This tutorial will create an interface into the InChI main()
function. This function is found on line
149 of the file ichimain.c
. For reasons I won't get into here, rename this method run
and change the second argument type to char **. Also, add a prototype for the run
function directly above line 149:
int run( int argc, char **argv ); // new line added
int run( int argc, char **argv ) // formerly line 149
Create the Interface File
The focal point of SWIG is the interface file. This file specifies the C functions you want to link into and some items to help in doing so. Create a file called libinchi.i
containing the following:
/* The name of this module. */
%module libinchi
/*
* Tells SWIG to treat char ** as a special case.
*/
%typemap(in) (int argc, char **argv) {
/* Get the length of the array */
int size = RARRAY($input)->len;
int i;
$1 = ($1_ltype) size;
$2 = (char **) malloc((size+1)*sizeof(char *));
/* Get the first element in memory */
VALUE *ptr = RARRAY($input)->ptr;
for (i=0; i < size; i++, ptr++)
/* Convert Ruby Object String to char* */
$2[i]= STR2CSTR(*ptr);
$2[i]=NULL; /* End of list */
}
/*
* Cleans up the char ** array created before
* the function call.
*/
%typemap(freearg) char ** {
free((char *) $1);
}
/*
* Function definition from ichimain.c.
*/
extern int run(int argc, char **argv);
The interface file has three main parts. The first part (line 2) names the module. The second part (lines 7-30) makes the necessary Ruby/C datatype conversions. The last part (line 35) tells SWIG the InChI functions we want to be able to access from Ruby.
Take a SWIG
At this point, SWIG has everything it needs to autogenerate our glue code. This can be done by:
swig -ruby libinchi.i
This command should have created a new source file, libinchi_wrap.c
, that contains all of the C glue code for our library. We'll have a look at the most important part of this file shortly.
Create a Makefile
We'll need a makefile with which to compile our library. Fortunately, Ruby makes this very easy. Create a file called extconf.rb
containing the following Ruby code:
require 'mkmf'
create_makefile('libinchi')
A makefile can now be generated by:
ruby extconf.rb
Build the Library
Our library can now be built with:
make
Use InChI from Ruby
We are now done with the basics. You can verify that the process worked through Interactive Ruby (irb):
irb
irb(main):001:0> require 'libinchi'
=> true
The return value of true
shows that Ruby loaded and recognized the binary library we just built (libinchi.so
). We are now able to use this library as if it were written in Ruby.
Use the Library
To test the library, copy a molfile called test.mol
into your inchi
directory. Now run this code:
require 'libinchi'
Libinchi.run(['', 'test.mol'])
You should get a lot of output from the InChI libary. If you take a look at the inchi
directory contents, a new file, test.mol.txt
, has been created. It contains the InChI identifier of the molecule contained in your molfile. This software also created a log file (test.mol.log
) and a problem file (test.mol.prb
).
You may be wondering why the first element in the Array
passed to Libinchi.run
is empty. The reason is that by convention a C main
method expects its first argument to be the name of the program itself. The InChI main
method takes this into account, and so the Array simply leaves its first element blank.
Customize the Library
Have a look at the libinchi_wrap.c
file that SWIG created. At the bottom of this file should be a function called Init_libinchi
:
SWIGEXPORT(void) Init_libinchi(void) {
int i;
SWIG_InitRuntime();
mLibinchi = rb_define_module("Libinchi");
for (i = 0; swig_types_initial[i]; i++) {
swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
SWIG_define_class(swig_types[i]);
}
rb_define_module_function(mLibinchi, "run", _wrap_run, -1);
}
This is what Ruby uses to map C functions to Ruby modules, classes, and methods. In this case, the C run
method is being mapped to a module called Libinchi
which has a run
method.
Let's say that you'd prefer a module name of InChI
with a method called write_inchi
. The following changes to Init_libinchi
will accomplish this:
SWIGEXPORT(void) Init_libinchi(void) {
int i;
SWIG_InitRuntime();
mLibinchi = rb_define_module("InChI");
for (i = 0; swig_types_initial[i]; i++) {
swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
SWIG_define_class(swig_types[i]);
}
rb_define_module_function(mLibinchi, "write_inchi", _wrap_run, -1);
}
Run make
again. Now the following can be used to write the InChI information for test.mol
:
require 'libinchi'
InChI.write_inchi(['', 'test.mol'])
Summing Up
SWIG simplifies the job of connecting high-level languages like Ruby to C/C++ libraries. Although not illustrated in the simple example above, SWIG offers several advanced tools for creating rich library interfaces. Given the large amount of chemical informatics software written in C/C++, and the increasing interest by developers in scripting languages such as Ruby, the SWIG approach is likely to be broadly useful in several areas of chemical informatics integration.
The C InChI toolkit appears in a few other Open Source projects including Open Babel, the Chemistry Development Kit via the JNI InChI Wrapper, and Rino. To my knowledge, none use SWIG. This will soon change as the approach described here becomes incorporated into Rino.
On a more general note, the availability of the InChI source code under an Open Source license is essential to developing and distributing the kind of integration library discussed here. We can only hope that others working in chemical informatics see the wisdom in a system that creates healthy software ecosystems wherever it takes hold.