Taking a SWIG of InChI

The IUPAC InChI developer toolkit is written in C. It is currently the only Open Source software capable of generating InChI identifiers. Software that needs to write InChIs must use the C toolkit in one form or another. This poses a problem for the large amount of chemical informatics software being written in other languages. In this article, I'll explain how the Open Source tool SWIG can solve this problem in a semi-automated way. The same concepts can, in principle, be used to link any library written in C/C++ with another language.

Prerequisites

This tutorial uses Ruby as the language that InChI will be linked with. You'll therefore need both Ruby and the Ruby development libraries installed. You'll also need SWIG and possibly the SWIG development libraries.

Use the Source, Luke

After downloading and unpacking InChI-1-API v1.0.1, collect all header (.h) and source (.c) files into a directory called inchi. These files can be found in the following two directories:

  • InChI-1-API/cInChI/common
  • InChI-1-API/cInChI/main

Find the Main Method

This tutorial will create an interface into the InChI main() function. This function is found on line 149 of the file ichimain.c. For reasons I won't get into here, rename this method run and change the second argument type to char **. Also, add a prototype for the run function directly above line 149:

int run( int argc, char **argv ); // new line added
int run( int argc, char **argv ) // formerly line 149

Create the Interface File

The focal point of SWIG is the interface file. This file specifies the C functions you want to link into and some items to help in doing so. Create a file called libinchi.i containing the following:

/* The name of this module. */
%module libinchi

/*
 * Tells SWIG to treat char ** as a special case.
 */
%typemap(in) (int argc, char **argv) {

 /* Get the length of the array */
 int size = RARRAY($input)->len; 
 int i;
 $1 = ($1_ltype) size;
 $2 = (char **) malloc((size+1)*sizeof(char *));

 /* Get the first element in memory */
 VALUE *ptr = RARRAY($input)->ptr; 
 for (i=0; i < size; i++, ptr++)

 /* Convert Ruby Object String to char* */
 $2[i]= STR2CSTR(*ptr); 
 $2[i]=NULL; /* End of list */
}

/*
 * Cleans up the char ** array created before 
 * the function call.
 */
%typemap(freearg) char ** {
 free((char *) $1);
}

/*
 * Function definition from ichimain.c.
 */
extern int run(int argc, char **argv);

The interface file has three main parts. The first part (line 2) names the module. The second part (lines 7-30) makes the necessary Ruby/C datatype conversions. The last part (line 35) tells SWIG the InChI functions we want to be able to access from Ruby.

Take a SWIG

At this point, SWIG has everything it needs to autogenerate our glue code. This can be done by:

swig -ruby libinchi.i

This command should have created a new source file, libinchi_wrap.c, that contains all of the C glue code for our library. We'll have a look at the most important part of this file shortly.

Create a Makefile

We'll need a makefile with which to compile our library. Fortunately, Ruby makes this very easy. Create a file called extconf.rb containing the following Ruby code:

require 'mkmf'

create_makefile('libinchi')

A makefile can now be generated by:

ruby extconf.rb

Build the Library

Our library can now be built with:

make

Use InChI from Ruby

We are now done with the basics. You can verify that the process worked through Interactive Ruby (irb):

irb
irb(main):001:0> require 'libinchi'
=> true

The return value of true shows that Ruby loaded and recognized the binary library we just built (libinchi.so). We are now able to use this library as if it were written in Ruby.

Use the Library

To test the library, copy a molfile called test.mol into your inchi directory. Now run this code:

require 'libinchi'

Libinchi.run(['', 'test.mol'])

You should get a lot of output from the InChI libary. If you take a look at the inchi directory contents, a new file, test.mol.txt, has been created. It contains the InChI identifier of the molecule contained in your molfile. This software also created a log file (test.mol.log) and a problem file (test.mol.prb).

You may be wondering why the first element in the Array passed to Libinchi.run is empty. The reason is that by convention a C main method expects its first argument to be the name of the program itself. The InChI main method takes this into account, and so the Array simply leaves its first element blank.

Customize the Library

Have a look at the libinchi_wrap.c file that SWIG created. At the bottom of this file should be a function called Init_libinchi:

SWIGEXPORT(void) Init_libinchi(void) {
  int i;

  SWIG_InitRuntime();
  mLibinchi = rb_define_module("Libinchi");

  for (i = 0; swig_types_initial[i]; i++) {
    swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
    SWIG_define_class(swig_types[i]);
  }

  rb_define_module_function(mLibinchi, "run", _wrap_run, -1);
}  

This is what Ruby uses to map C functions to Ruby modules, classes, and methods. In this case, the C run method is being mapped to a module called Libinchi which has a run method.

Let's say that you'd prefer a module name of InChI with a method called write_inchi. The following changes to Init_libinchi will accomplish this:

SWIGEXPORT(void) Init_libinchi(void) {
  int i;

  SWIG_InitRuntime();
  mLibinchi = rb_define_module("InChI");

  for (i = 0; swig_types_initial[i]; i++) {
    swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
    SWIG_define_class(swig_types[i]);
  }

  rb_define_module_function(mLibinchi, "write_inchi", _wrap_run, -1);
}

Run make again. Now the following can be used to write the InChI information for test.mol:

require 'libinchi'

InChI.write_inchi(['', 'test.mol'])

Summing Up

SWIG simplifies the job of connecting high-level languages like Ruby to C/C++ libraries. Although not illustrated in the simple example above, SWIG offers several advanced tools for creating rich library interfaces. Given the large amount of chemical informatics software written in C/C++, and the increasing interest by developers in scripting languages such as Ruby, the SWIG approach is likely to be broadly useful in several areas of chemical informatics integration.

The C InChI toolkit appears in a few other Open Source projects including Open Babel, the Chemistry Development Kit via the JNI InChI Wrapper, and Rino. To my knowledge, none use SWIG. This will soon change as the approach described here becomes incorporated into Rino.

On a more general note, the availability of the InChI source code under an Open Source license is essential to developing and distributing the kind of integration library discussed here. We can only hope that others working in chemical informatics see the wisdom in a system that creates healthy software ecosystems wherever it takes hold.