<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: From SMILES to InChI: Rino, CDK, and Ruby Java Bridge</title>
    <link>http://depth-first.com/articles/2006/08/26/from-smiles-to-inchi-rino-cdk-and-ruby-java-bridge</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>From SMILES to InChI: Rino, CDK, and Ruby Java Bridge</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/ruby.gif" align="right"&gt;&lt;/img&gt;Integrating Ruby and Java is fast and easy with &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt; (RJB), which was &lt;a href="http://depth-first.com/articles/2006/08/26/scripting-java-libraries-with-ruby-java-bridge"&gt;discussed previously&lt;/a&gt;. In this article, I'll show how RJB can be used to solve a practical chemical informatics problem - the conversion of SMILES strings into &lt;a href="http://depth-first.com/articles/2006/08/12/inchi-canonicalization-algorithm"&gt;InChI&lt;/a&gt; identifiers.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;This tutorial is aimed at Linux users, but you should be able to accomplish the same thing in Windows and Mac OS X, although these systems have not been tested. You'll need to install a few software packages if you haven't done so already: Ruby; Ruby Gems; RJB; CDK; and Rino. After &lt;a href="http://docs.rubygems.org/read/chapter/3"&gt;installing RubyGems&lt;/a&gt;, RJB and Rino can both be installed from the command line (as root):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
# gem install rjb
# gem install rino
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Next, create a working directory, &lt;strong&gt;smi2inchi&lt;/strong&gt;. Into this directory, move a copy of the full &lt;a href="http://prdownloads.sourceforge.net/cdk/cdk-20060714.jar?download"&gt;CDK-2006714 jarfile&lt;/a&gt;. That's it for libraries, so let's move onto the translator itself.&lt;/p&gt;

&lt;h4&gt;The Translator&lt;/h4&gt;

&lt;p&gt;The Translator class consists of a small piece of Ruby code gluing CDK's SmilesParser and MDLWriter with the Ruby InChI library &lt;a href="http://depth-first.com/articles/2006/08/17/ruby-and-inchi-the-rino-library"&gt;Rino&lt;/a&gt;. Rino is a thin Ruby wrapper around the &lt;a href="http://inchi.sourceforge.net/"&gt;IUPAC InChI library&lt;/a&gt;, which is in turn written in C.&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="constant"&gt;ENV&lt;/span&gt;&lt;span class="punct"&gt;['&lt;/span&gt;&lt;span class="string"&gt;CLASSPATH&lt;/span&gt;&lt;span class="punct"&gt;']&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;./cdk-20060714.jar&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rino&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;StringWriter&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java.io.StringWriter&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;SmilesParser&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.smiles.SmilesParser&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="constant"&gt;MDLWriter&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.io.MDLWriter&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="comment"&gt;# Converts a SMILES string into an InChI identifier using&lt;/span&gt;
&lt;span class="comment"&gt;# the CDK Library (Java) and the Rino Library (Ruby/C).&lt;/span&gt;
&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Translator&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@smiles_parser&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;SmilesParser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@mdl_writer&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;MDLWriter&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@mol2inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rino&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;MolfileReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns an InChI identifier from the specified SMILES string.&lt;/span&gt;
  &lt;span class="comment"&gt;# Uses the CDK classes SmilesParser and MDLWriter to generate&lt;/span&gt;
  &lt;span class="comment"&gt;# a molfile from a SMILES string. Then this molfile is&lt;/span&gt;
  &lt;span class="comment"&gt;# parsed by Rino::MolfileReader.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;translate&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@smiles_parser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parseSmiles&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;sw&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;StringWriter&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="attribute"&gt;@mdl_writer&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;setWriter&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;sw&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="attribute"&gt;@mdl_writer&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="attribute"&gt;@mol2inchi&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;sw&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;toString&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

Add the above code to a file called &lt;strong&gt;smi2inchi.rb&lt;/strong&gt;. The first line points the CLASSPATH environment variable, which is needed by RJB, to the CDK library. Lines 3-6 include the RJB and Rino RubyGems. Lines 8-11 import the built-in Java class StringWriter and the CDK Java classes SmilesParser and MDLWriter using RJB's syntax. The core of the class consists of the &lt;tt&gt;translate&lt;/tt&gt; method, which simply coordinates the pieces.

Using the Translator class consists of creating an instance, and invoking its &lt;tt&gt;translate&lt;/tt&gt; method on a SMILES string:

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;smi2inchi&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;translator&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Translator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
&lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;translator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;translate&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;c1ccccc1&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;p&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; &amp;quot;InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H&amp;quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

The above code fragment can be saved to a text file (e.g. &lt;strong&gt;test.rb&lt;/strong&gt;) and invoked with the Ruby interpreter:

&lt;div class="console"&gt;
&lt;pre&gt;
$ ruby test.rb
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Alternatively, it can be entered interactively with the Interactive Ruby Interpreter (irb):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;With just a few lines of Ruby, we've solved a real problem. This example integrates software from three different programming languages: Ruby, C, and Java. Given the variety of chemical informatics software written in these languages, Ruby Java Bridge offers numerous integration possibilities.&lt;/p&gt;</description>
      <pubDate>Sat, 26 Aug 2006 15:37:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:338917d6-9c77-459e-9d10-dfaaf1f79ff7</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/08/26/from-smiles-to-inchi-rino-cdk-and-ruby-java-bridge</link>
      <category>Tools</category>
      <category>ruby</category>
      <category>java</category>
      <category>inchi</category>
      <category>smiles</category>
      <category>integration</category>
    </item>
  </channel>
</rss>
