Taking a SWIG of InChI

Posted by Rich Apodaca Sat, 16 Sep 2006 18:43:00 GMT

The IUPAC InChI developer toolkit is written in C. It is currently the only Open Source software capable of generating InChI identifiers. Software that needs to write InChIs must use the C toolkit in one form or another. This poses a problem for the large amount of chemical informatics software being written in other languages. In this article, I'll explain how the Open Source tool SWIG can solve this problem in a semi-automated way. The same concepts can, in principle, be used to link any library written in C/C++ with another language.

Prerequisites

This tutorial uses Ruby as the language that InChI will be linked with. You'll therefore need both Ruby and the Ruby development libraries installed. You'll also need SWIG and possibly the SWIG development libraries.

Use the Source, Luke

After downloading and unpacking InChI-1-API v1.0.1, collect all header (*.h) and source (*.c) files into a directory called inchi. These files can be found in the following two directories:

  • InChI-1-API/cInChI/common
  • InChI-1-API/cInChI/main

Find the Main Method

This tutorial will create an interface into the InChI main() function. This function is found on line 149 of the file ichimain.c. For reasons I won't get into here, rename this method run and change the second argument type to char **. Also, add a prototype for the run function directly above line 149:

int run( int argc, char **argv ); // new line added

int run( int argc, char **argv ) // formerly line 149

Create the Interface File

The focal point of SWIG is the interface file. This file specifies the C functions you want to link into and some items to help in doing so. Create a file called libinchi.i containing the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/* The name of this module. */
%module libinchi

/*
 * Tells SWIG to treat char ** as a special case.
 */
%typemap(in) (int argc, char **argv) {

 /* Get the length of the array */
 int size = RARRAY($input)->len; 
 int i;
 $1 = ($1_ltype) size;
 $2 = (char **) malloc((size+1)*sizeof(char *));

 /* Get the first element in memory */
 VALUE *ptr = RARRAY($input)->ptr; 
 for (i=0; i < size; i++, ptr++)

 /* Convert Ruby Object String to char* */
 $2[i]= STR2CSTR(*ptr); 
 $2[i]=NULL; /* End of list */
}

/*
 * Cleans up the char ** array created before 
 * the function call.
 */
%typemap(freearg) char ** {
 free((char *) $1);
}

/*
 * Function definition from ichimain.c.
 */
extern int run(int argc, char **argv);

The interface file has three main parts. The first part (line 2) names the module. The second part (lines 7-30) makes the necessary Ruby/C datatype conversions. The last part (line 35) tells SWIG the InChI functions we want to be able to access from Ruby.

Take a SWIG

At this point, SWIG has everything it needs to autogenerate our glue code. This can be done by:

$ swig -ruby libinchi.i

This command should have created a new source file, libinchi_wrap.c, that contains all of the C glue code for our library. We'll have a look at the most important part of this file shortly.

Create a Makefile

We'll need a makefile with which to compile our library. Fortunately, Ruby makes this very easy. Create a file called extconf.rb containing the following Ruby code:

require 'mkmf'

create_makefile('libinchi')
A makefile can now be generated by:
$ ruby extconf.rb

Build the Library

Our library can now be built with:

$ make

Use InChI from Ruby

We are now done with the basics. You can verify that the process worked through Interactive Ruby (irb):

$ irb
irb(main):001:0> require 'libinchi'
=> true

The return value of true shows that Ruby loaded and recognized the binary library we just built (libinchi.so). We are now able to use this library as if it were written in Ruby.

Use the Library

To test the library, copy a molfile called test.mol into your inchi directory. Now run this code:

require 'libinchi'

Libinchi.run(['', 'test.mol'])

You should get a lot of output from the InChI libary. If you take a look at the inchi directory contents, a new file, test.mol.txt, has been created. It contains the InChI identifier of the molecule contained in your molfile. This software also created a log file (test.mol.log) and a problem file (test.mol.prb).

You may be wondering why the first element in the Array passed to Libinchi.run is empty. The reason is that by convention a C main method expects its first argument to be the name of the program itself. The InChI main method takes this into account, and so the Array simply leaves its first element blank.

Customize the Library

Have a look at the libinchi_wrap.c file that SWIG created. At the bottom of this file should be a function called Init_libinchi:

SWIGEXPORT(void) Init_libinchi(void) {
  int i;

  SWIG_InitRuntime();
  mLibinchi = rb_define_module("Libinchi");

  for (i = 0; swig_types_initial[i]; i++) {
    swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
    SWIG_define_class(swig_types[i]);
  }

  rb_define_module_function(mLibinchi, "run", _wrap_run, -1);
}

This is what Ruby uses to map C functions to Ruby modules, classes, and methods. In this case, the C run method is being mapped to a module called Libinchi which has a run method.

Let's say that you'd prefer a module name of InChI with a method called write_inchi. The following changes to Init_libinchi will accomplish this:

SWIGEXPORT(void) Init_libinchi(void) {
  int i;

  SWIG_InitRuntime();
  mLibinchi = rb_define_module("InChI");

  for (i = 0; swig_types_initial[i]; i++) {
    swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
    SWIG_define_class(swig_types[i]);
  }

  rb_define_module_function(mLibinchi, "write_inchi", _wrap_run, -1);
}

Run make again. Now the following can be used to write the InChI information for test.mol:

require 'libinchi'

InChI.write_inchi(['', 'test.mol'])

Summing Up

SWIG simplifies the job of connecting high-level languages like Ruby to C/C++ libraries. Although not illustrated in the simple example above, SWIG offers several advanced tools for creating rich library interfaces. Given the large amount of chemical informatics software written in C/C++, and the increasing interest by developers in scripting languages such as Ruby, the SWIG approach is likely to be broadly useful in several areas of chemical informatics integration.

The C InChI toolkit appears in a few other Open Source projects including Open Babel, the Chemistry Development Kit via the JNI InChI Wrapper, and Rino. To my knowledge, none use SWIG. This will soon change as the approach described here becomes incorporated into Rino.

On a more general note, the availability of the InChI source code under an Open Source license is essential to developing and distributing the kind of integration library discussed here. We can only hope that others working in chemical informatics see the wisdom in a system that creates healthy software ecosystems wherever it takes hold.

From SMILES to InChI: Rino, CDK, and Ruby Java Bridge

Posted by Rich Apodaca Sat, 26 Aug 2006 19:37:00 GMT

Integrating Ruby and Java is fast and easy with Ruby Java Bridge (RJB), which was discussed previously. In this article, I'll show how RJB can be used to solve a practical chemical informatics problem - the conversion of SMILES strings into InChI identifiers.

Prerequisites

This tutorial is aimed at Linux users, but you should be able to accomplish the same thing in Windows and Mac OS X, although these systems have not been tested. You'll need to install a few software packages if you haven't done so already: Ruby; Ruby Gems; RJB; CDK; and Rino. After installing RubyGems, RJB and Rino can both be installed from the command line (as root):

# gem install rjb
# gem install rino

Next, create a working directory, smi2inchi. Into this directory, move a copy of the full CDK-2006714 jarfile. That's it for libraries, so let's move onto the translator itself.

The Translator

The Translator class consists of a small piece of Ruby code gluing CDK's SmilesParser and MDLWriter with the Ruby InChI library Rino. Rino is a thin Ruby wrapper around the IUPAC InChI library, which is in turn written in C.

ENV['CLASSPATH'] = './cdk-20060714.jar'

require 'rubygems'
require_gem 'rjb'
require_gem 'rino'
require 'rjb'

StringWriter = Rjb::import 'java.io.StringWriter'

SmilesParser = Rjb::import 'org.openscience.cdk.smiles.SmilesParser'
MDLWriter = Rjb::import 'org.openscience.cdk.io.MDLWriter'

# Converts a SMILES string into an InChI identifier using
# the CDK Library (Java) and the Rino Library (Ruby/C).
class Translator

  def initialize
    @smiles_parser = SmilesParser.new
    @mdl_writer = MDLWriter.new
    @mol2inchi = Rino::MolfileReader.new
  end

  # Returns an InChI identifier from the specified SMILES string.
  # Uses the CDK classes SmilesParser and MDLWriter to generate
  # a molfile from a SMILES string. Then this molfile is
  # parsed by Rino::MolfileReader.
  def translate(smiles)
    mol = @smiles_parser.parseSmiles(smiles)

    sw = StringWriter.new

    @mdl_writer.setWriter(sw)
    @mdl_writer.write(mol)

    @mol2inchi.read(sw.toString)
  end
end
Add the above code to a file called smi2inchi.rb. The first line points the CLASSPATH environment variable, which is needed by RJB, to the CDK library. Lines 3-6 include the RJB and Rino RubyGems. Lines 8-11 import the built-in Java class StringWriter and the CDK Java classes SmilesParser and MDLWriter using RJB's syntax. The core of the class consists of the translate method, which simply coordinates the pieces. Using the Translator class consists of creating an instance, and invoking its translate method on a SMILES string:
require 'smi2inchi'

translator = Translator.new
inchi = translator.translate 'c1ccccc1'

p inchi # => "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"
The above code fragment can be saved to a text file (e.g. test.rb) and invoked with the Ruby interpreter:
$ ruby test.rb

Alternatively, it can be entered interactively with the Interactive Ruby Interpreter (irb):

$ irb
irb(main):001:0>

With just a few lines of Ruby, we've solved a real problem. This example integrates software from three different programming languages: Ruby, C, and Java. Given the variety of chemical informatics software written in these languages, Ruby Java Bridge offers numerous integration possibilities.

Scripting Java Libraries with Ruby Java Bridge

Posted by Rich Apodaca Sat, 26 Aug 2006 09:46:00 GMT

Although JRuby solves many Java/Ruby integration issues, in some cases it's not the right solution. One situation is when you want your Ruby code to use extensions written in C. The JRuby documentation makes very clear that this will never be supported. Another situation is if your code needs full access to Ruby on Rails, or if your hosting service makes it difficult to configure JRuby on Rails. In these cases, JRuby's currently limited Rails support makes it a suboptimal choice.

Ruby Java Bridge (RJB) is designed to solve these problems by letting Ruby developers manipulate Java libraries from Ruby. This gives you the ability to access C Ruby extensions and Java libraries in the same Ruby program. It also makes Rails integration a snap. Articles to follow will explore these two points. For now, let's see how how to get RJB working.

Installing Ruby Java Bridge is very simple. With root access:

gem install rjb

This installs the Ruby Java Bridge gem. That's all there is to it.

Instantiating and using Java classes consists of the familiar process of first importing the class followed by creating a new instance:

require 'rubygems'
require_gem 'rjb'
require 'rjb'

string_class = Rjb::import 'java.lang.String'
hello_string = string_class.new_with_sig('Ljava.lang.String;', 'hello')

p hello_string.toString # -> "hello"

Because an argument is passed to the constructor of the Java class, a special form needs to be used, new_with_sig. The "L" in front of the import statement indicates that the argument "hello" is a non-primitive datatype (i.e. class or interface).

Ruby Java Bridge offers some important advantages over JRuby. Subsequent articles will explore how these advantages can be used to quickly develop applications integrating chemical informatics libraries written in multiple languages.

Scripting CDK with JRuby

Posted by Rich Apodaca Thu, 24 Aug 2006 18:04:00 GMT

A previous article discussed the use of the Java chemical informatics library Octet from JRuby. This article will discuss the use of another Java chemical infomatics library, CDK, from JRuby. A small Ruby class will be developed that generates a molfile with completely assigned 2-D coordinates from a SMILES string.

Older posts: 1 2 3