<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Easily Convert IUPAC Nomenclature to SMILES, InChI, or Molfile with Rubidium</title>
    <link>http://depth-first.com/articles/2007/10/19/easily-convert-iupac-nomenclature-to-smiles-inchi-or-molfile-with-rubidium</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Easily Convert IUPAC Nomenclature to SMILES, InChI, or Molfile with Rubidium</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20071015/rubidium.png" align="right"&gt;&lt;/img&gt;A recent article &lt;a href="http://depth-first.com/articles/2007/10/15/an-introduction-to-the-rubidium-cheminforamtics-toolkit-interconvert-smiles-inchi-and-molfile-with-an-open-babel-like-interface"&gt;introduced Rubidium&lt;/a&gt;, a cheminformatics toolkit written in Ruby. One of Ruby's strengths is the speed with which it enables disparate pieces of code to be glued together - even if they're written in different programming languages. In this article, we'll see how Rubidium can be extended to provide support for converting IUPAC nomenclature into SMILES, InChI, or Molfile formats.&lt;/p&gt;

&lt;h4&gt;About Rubidium&lt;/h4&gt;

&lt;p&gt;Rubidium is a cheminformatics toolkit written in Ruby. Rubidium is currently configured to run on &lt;a href="http://jruby.codehaus.org/"&gt;JRuby&lt;/a&gt;, although future versions may also work with &lt;a href="http://en.wikipedia.org/wiki/Ruby_(programming_language"&gt;Matz' Ruby Implementation&lt;/a&gt;) (MRI) via &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Rubidium will eventually be packaged as a &lt;a href="http://www.rubygems.org/"&gt;RubyGem&lt;/a&gt; and hosted on &lt;a href="http://rubyforge.org"&gt;RubyForge&lt;/a&gt;. For now, the toolkit consists of a running library that will updated and documented on this blog.&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;The library extends the CDK module presented in the &lt;a href="http://depth-first.com/articles/2007/10/15/an-introduction-to-the-rubidium-cheminforamtics-toolkit-interconvert-smiles-inchi-and-molfile-with-an-open-babel-like-interface"&gt;previous article in this series&lt;/a&gt;. The main change is the addition of an &lt;tt&gt;IUPACReader&lt;/tt&gt; class, based on Peter Corbett's excellent &lt;a href="http://depth-first.com/articles/2007/10/12/jruby-for-cheminformatics-parsing-iupac-nomenclature-with-opsin"&gt;OPSIN library&lt;/a&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;IUPACReader&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java.io.StringReader&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;uk.ac.cam.ch.wwmm.opsin.NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.io.CMLReader&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.ChemFile&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@iupac_reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;CMLReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;read&lt;/span&gt; &lt;span class="ident"&gt;name&lt;/span&gt;
    &lt;span class="ident"&gt;cml&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@iupac_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parse_to_cml&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;name&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Could not parse '&lt;span class="expr"&gt;#{name}&lt;/span&gt;'.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="keyword"&gt;unless&lt;/span&gt; &lt;span class="ident"&gt;cml&lt;/span&gt;

    &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;set_reader&lt;/span&gt; &lt;span class="constant"&gt;StringReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;to_xml&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;chem_file&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt; &lt;span class="constant"&gt;ChemFile&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="ident"&gt;chem_file&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;chem_sequence&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;chem_model&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;molecule_set&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Using this additional functionality requires nothing more than copying the &lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;OPSIN jarfile&lt;/a&gt; into the &lt;strong&gt;lib&lt;/strong&gt; directory of your JRuby installation. You'll also need to place the &lt;a href="http://downloads.sourceforge.net/cdk/cdk-1.0.1.jar?modtime=1182877138&amp;big_mirror=0"&gt;CDK jarfile&lt;/a&gt; in this directory if you haven't done so already.&lt;/p&gt;

&lt;p&gt;The complete Rubidium library can be &lt;a href="http://depth-first.com/demo/20071019/cdk.rb"&gt;downloaded here&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;A Test&lt;/h4&gt;

&lt;p&gt;We can test Rubidium's IUPAC nomenclature parsing abilities with &lt;tt&gt;jirb&lt;/tt&gt;. For example, to convert from name to SMILES:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'cdk'
=&gt; true
irb(main):002:0&gt; c=CDK::Conversion.new
=&gt; #&amp;lt;CDK::Conversion:0x46ca65 ... &amp;gt;
irb(main):003:0&gt; c.set_formats 'iupac', 'smi'
=&gt; "smi"
irb(main):004:0&gt; c.convert '1,4-dichlorobenzene'
=&gt; "C=1C=C(C=CC=1Cl)Cl"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;To convert from name to InChI (in the same &lt;tt&gt;jirb&lt;/tt&gt; session):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
irb(main):005:0&gt; c.set_out_format 'inchi'
=&gt; "inchi"
irb(main):006:0&gt; c.convert '1,4-dichlorobenzene'
=&gt; "InChI=1/C6H4Cl2/c7-5-1-2-6(8)4-3-5/h1-4H"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;And to convert from name to Molfile (also in the same &lt;tt&gt;jirb&lt;/tt&gt; session):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
irb(main):007:0&gt; c.set_out_format 'mol'
=&gt; "mol"
irb(main):008:0&gt; c.convert '1,4-dichlorobenzene'
=&gt; "\n  CDK    10/19/07,7:59\n\n  8  8  0  0  0  0  0  0  0  0999 V2000\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n  1  2  2  0  0  0  0 \n  2  3  1  0  0  0  0 \n  3  4  2  0  0  0  0 \n  4  5  1  0  0  0  0 \n  5  6  2  0  0  0  0 \n  6  1  1  0  0  0  0 \n  7  1  1  0  0  0  0 \n  8  4  1  0  0  0  0 \nM  END\n"
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;By re-using a simple conversion API together with another Java library, we've given Rubidium the ability to translate IUPAC nomenclature into other molecular languages. The additional code was both easy to write and easy to test. Future articles will discuss the packaging, distribution, and further elaboration of Rubidium.&lt;/p&gt;</description>
      <pubDate>Fri, 19 Oct 2007 10:05:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:1b7e76b3-93a7-4372-982f-cd60c9ed40d0</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/19/easily-convert-iupac-nomenclature-to-smiles-inchi-or-molfile-with-rubidium</link>
      <category>Tools</category>
      <category>rubidium</category>
      <category>iupac</category>
      <category>smiles</category>
      <category>inchi</category>
      <category>moflile</category>
    </item>
  </channel>
</rss>
