<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Taking a SWIG of InChI</title>
    <link>http://depth-first.com/articles/2006/09/16/taking-a-swig-of-inchi</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Taking a SWIG of InChI</title>
      <description>&lt;p&gt;The IUPAC InChI &lt;a href="http://www.iupac.org/inchi/"&gt;developer toolkit&lt;/a&gt; is written in C. It is currently the only Open Source software capable of generating &lt;a href="http://wwmm.ch.cam.ac.uk/inchifaq/"&gt;InChI identifiers&lt;/a&gt;. Software that needs to write InChIs must use the C toolkit in one form or another. This poses a problem for the large amount of chemical informatics software being written in other languages. In this article, I'll explain how the Open Source tool &lt;a href="http://www.swig.org/"&gt;SWIG&lt;/a&gt; can solve this problem in a semi-automated way. The same concepts can, in principle, be used to link any library written in C/C++ with another language.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;This tutorial uses Ruby as the language that InChI will be linked with. You'll therefore need both Ruby and the Ruby development libraries installed. You'll also need SWIG and possibly the SWIG development libraries.&lt;/p&gt;

&lt;h4&gt;Use the Source, Luke&lt;/h4&gt;

&lt;p&gt;After downloading and unpacking &lt;a href="http://www.iupac.org/inchi/license.html"&gt;InChI-1-API  v1.0.1&lt;/a&gt;, collect all header (*.h) and source (*.c) files into a directory called &lt;strong&gt;inchi&lt;/strong&gt;. These files can be found in the following two directories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;InChI-1-API/cInChI/common&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;InChI-1-API/cInChI/main&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Find the Main Method&lt;/h4&gt;

&lt;p&gt;This tutorial will create an interface into the InChI &lt;tt&gt;main()&lt;/tt&gt; function. This function is found on line 
149 of the file &lt;strong&gt;ichimain.c&lt;/strong&gt;. For reasons I won't get into here, rename this method &lt;tt&gt;run&lt;/tt&gt; and change the second argument type to char&amp;nbsp;**. Also, add a prototype for the &lt;tt&gt;run&lt;/tt&gt; function directly above line 149:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;int run( int argc, char **argv ); // new line added

int run( int argc, char **argv ) // formerly line 149&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Create the Interface File&lt;/h4&gt;

&lt;p&gt;The focal point of SWIG is the interface file. This file specifies the C functions you want to link into and some items to help in doing so. Create a file called &lt;strong&gt;libinchi.i&lt;/strong&gt; containing the following:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;table class="typocode_linenumber"&gt;&lt;tr&gt;&lt;td class="lineno"&gt;
&lt;pre&gt;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
&lt;/pre&gt;
&lt;/td&gt;&lt;td width="100%"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;/* The name of this module. */
%module libinchi

/*
 * Tells SWIG to treat char ** as a special case.
 */
%typemap(in) (int argc, char **argv) {

 /* Get the length of the array */
 int size = RARRAY($input)-&amp;gt;len; 
 int i;
 $1 = ($1_ltype) size;
 $2 = (char **) malloc((size+1)*sizeof(char *));

 /* Get the first element in memory */
 VALUE *ptr = RARRAY($input)-&amp;gt;ptr; 
 for (i=0; i &amp;lt; size; i++, ptr++)

 /* Convert Ruby Object String to char* */
 $2[i]= STR2CSTR(*ptr); 
 $2[i]=NULL; /* End of list */
}

/*
 * Cleans up the char ** array created before 
 * the function call.
 */
%typemap(freearg) char ** {
 free((char *) $1);
}

/*
 * Function definition from ichimain.c.
 */
extern int run(int argc, char **argv);&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The interface file has three main parts. The first part (line 2) names the module. The second part (lines 7-30) makes the necessary Ruby/C datatype conversions. The last part (line 35) tells SWIG the InChI functions we want to be able to access from Ruby.&lt;/p&gt;

&lt;h4&gt;Take a SWIG&lt;/h4&gt;

&lt;p&gt;At this point, SWIG has everything it needs to autogenerate our glue code. This can be done by:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ swig -ruby libinchi.i
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This command should have created a new source file, &lt;strong&gt;libinchi_wrap.c&lt;/strong&gt;, that contains all of the C glue code for our library. We'll have a look at the most important part of this file shortly.&lt;/p&gt;

&lt;h4&gt;Create a Makefile&lt;/h4&gt;

&lt;p&gt;We'll need a makefile with which to compile our library. Fortunately, Ruby makes this very easy. Create a file called &lt;strong&gt;extconf.rb&lt;/strong&gt; containing the following Ruby code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mkmf&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;create_makefile&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;libinchi&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

A makefile can now be generated by:

&lt;div class="console"&gt;
&lt;pre&gt;
$ ruby extconf.rb
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Build the Library&lt;/h4&gt;

&lt;p&gt;Our library can now be built with:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ make
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Use InChI from Ruby&lt;/h4&gt;

&lt;p&gt;We are now done with the basics. You can verify that the process worked through Interactive Ruby (irb):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'libinchi'
=&gt; true
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The return value of &lt;tt&gt;true&lt;/tt&gt; shows that Ruby loaded and recognized the binary library we just built (&lt;strong&gt;libinchi.so&lt;/strong&gt;). We are now able to use this library as if it were written in Ruby.&lt;/p&gt;

&lt;h4&gt;Use the Library&lt;/h4&gt;

&lt;p&gt;To test the library, copy a molfile called &lt;strong&gt;test.mol&lt;/strong&gt; into your &lt;strong&gt;inchi&lt;/strong&gt; directory. Now run this code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;libinchi&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;Libinchi&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;run&lt;/span&gt;&lt;span class="punct"&gt;(['&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;test.mol&lt;/span&gt;&lt;span class="punct"&gt;'])&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You should get a lot of output from the InChI libary. If you take a look at the &lt;strong&gt;inchi&lt;/strong&gt; directory contents, a new file, &lt;strong&gt;test.mol.txt&lt;/strong&gt;, has been created. It contains the InChI identifier of the molecule contained in your molfile. This software also created a log file (&lt;strong&gt;test.mol.log&lt;/strong&gt;) and a problem file (&lt;strong&gt;test.mol.prb&lt;/strong&gt;).&lt;/p&gt;

&lt;p&gt;You may be wondering why the first element in the &lt;tt&gt;Array&lt;/tt&gt; passed to &lt;tt&gt;Libinchi.run&lt;/tt&gt; is empty. The reason is that by convention a C &lt;tt&gt;main&lt;/tt&gt; method expects its first argument to be the name of the program itself. The InChI &lt;tt&gt;main&lt;/tt&gt; method takes this into account, and so the Array simply leaves its first element blank.&lt;/p&gt;

&lt;h4&gt;Customize the Library&lt;/h4&gt;

&lt;p&gt;Have a look at the &lt;strong&gt;libinchi_wrap.c&lt;/strong&gt; file that SWIG created. At the bottom of this file should be a function called &lt;tt&gt;Init_libinchi&lt;/tt&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;SWIGEXPORT(void) Init_libinchi(void) {
  int i;

  SWIG_InitRuntime();
  mLibinchi = rb_define_module(&amp;quot;Libinchi&amp;quot;);

  for (i = 0; swig_types_initial[i]; i++) {
    swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
    SWIG_define_class(swig_types[i]);
  }

  rb_define_module_function(mLibinchi, &amp;quot;run&amp;quot;, _wrap_run, -1);
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is what Ruby uses to map C functions to Ruby modules, classes, and methods. In this case, the C &lt;tt&gt;run&lt;/tt&gt; method is being mapped to a module called &lt;tt&gt;Libinchi&lt;/tt&gt; which has a &lt;tt&gt;run&lt;/tt&gt; method.&lt;/p&gt;

&lt;p&gt;Let's say that you'd prefer a module name of &lt;tt&gt;InChI&lt;/tt&gt; with a method called &lt;tt&gt;write_inchi&lt;/tt&gt;. The following changes to &lt;tt&gt;Init_libinchi&lt;/tt&gt; will accomplish this:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;SWIGEXPORT(void) Init_libinchi(void) {
  int i;

  SWIG_InitRuntime();
  mLibinchi = rb_define_module(&amp;quot;InChI&amp;quot;);

  for (i = 0; swig_types_initial[i]; i++) {
    swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
    SWIG_define_class(swig_types[i]);
  }

  rb_define_module_function(mLibinchi, &amp;quot;write_inchi&amp;quot;, _wrap_run, -1);
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Run &lt;tt&gt;make&lt;/tt&gt; again. Now the following can be used to write the InChI information for &lt;strong&gt;test.mol&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;libinchi&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;InChI&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write_inchi&lt;/span&gt;&lt;span class="punct"&gt;(['&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;test.mol&lt;/span&gt;&lt;span class="punct"&gt;'])&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Summing Up&lt;/h4&gt;

&lt;p&gt;SWIG simplifies the job of connecting high-level languages like Ruby to C/C++ libraries. Although not illustrated in the simple example above, SWIG offers several advanced tools for creating rich library interfaces. Given the large amount of chemical informatics software written in C/C++, and the increasing interest by developers in scripting languages such as Ruby, the SWIG approach is likely to be broadly useful in several areas of chemical informatics integration.&lt;/p&gt;

&lt;p&gt;The C InChI toolkit appears in a few other Open Source projects including &lt;a href="http://openbabel.sf.net"&gt;Open Babel&lt;/a&gt;, the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; via the &lt;a href="http://sourceforge.net/projects/jni-inchi"&gt;JNI InChI Wrapper&lt;/a&gt;, and &lt;a href="http://rubyforge.org/projects/rino"&gt;Rino&lt;/a&gt;. To my knowledge, none use SWIG. This will soon change as the approach described here becomes incorporated into Rino.&lt;/p&gt;

&lt;p&gt;On a more general note, the availability of the InChI source code under an &lt;a href="http://www.rosenlaw.com/oslbook.htm"&gt;Open Source license&lt;/a&gt; is essential to developing and distributing the kind of integration library discussed here. We can only hope that others working in chemical informatics see the wisdom in  a system that creates healthy &lt;a href="http://www.businessweek.com/technology/content/oct2005/tc2005103_0519_tc_218.htm"&gt;software ecosystems&lt;/a&gt; wherever it takes hold.  &lt;/p&gt;</description>
      <pubDate>Sat, 16 Sep 2006 14:43:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:c17f005c-92b2-4c0e-ba7e-16c3a5c596cf</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/16/taking-a-swig-of-inchi</link>
      <category>Tools</category>
      <category>swig</category>
      <category>ruby</category>
      <category>inchi</category>
      <category>opensource</category>
      <category>integration</category>
    </item>
  </channel>
</rss>
