<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag formula</title>
    <link>http://depth-first.com/articles/tag/formula</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>From IUPAC Name to Molecular Formula with Ruby CDK</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right"&gt;&lt;/a&gt;Recently, a question was raised on the &lt;a href="http://tech.groups.yahoo.com/group/chemoinf/"&gt;Yahoo cheminf group list&lt;/a&gt; regarding the conversion of IUPAC names into molecular formulas. This can be done quickly with Ruby CDK, as this article will show.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;This tutorial requires &lt;a href="http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0"&gt;Ruby CDK&lt;/a&gt;, which in turn requires &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt; (RJB). A recent Depth-First article described the minimal system configuration required to run &lt;a href="http://depth-first.com/articles/2006/08/26/scripting-java-libraries-with-ruby-java-bridge"&gt;RJB on Linux&lt;/a&gt;. Another article showed how to install &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;RJB on Windows&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;A Small Library&lt;/h4&gt;

&lt;p&gt;The following library will convert IUPAC nomenclature into molecular formulas with Ruby:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk/util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;Formulator&lt;/span&gt;
  &lt;span class="attribute"&gt;@@hydrogen_adder&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.tools.HydrogenAdder&lt;/span&gt;&lt;span class="punct"&gt;').&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;get_formula&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;iupac_name&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Lang&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read_iupac&lt;/span&gt; &lt;span class="ident"&gt;iupac_name&lt;/span&gt;
    &lt;span class="attribute"&gt;@@hydrogen_adder&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;addExplicitHydrogensToSatisfyValency&lt;/span&gt; &lt;span class="ident"&gt;mol&lt;/span&gt;
    &lt;span class="ident"&gt;analyzer&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.tools.MFAnalyser&lt;/span&gt;&lt;span class="punct"&gt;').&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;analyzer&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getMolecularFormula&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Save this code as a file named &lt;strong&gt;formulator.rb&lt;/strong&gt; in your working directory.&lt;/p&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;The Formulator library can be tested with the following code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;formulator&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;include&lt;/span&gt; &lt;span class="constant"&gt;Formulator&lt;/span&gt;

&lt;span class="ident"&gt;get_formula&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;benzene&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; &amp;quot;C6H6&amp;quot;&lt;/span&gt;
&lt;span class="ident"&gt;get_formula&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;4-(3,4-dichlorophenyl)-N-methyl-1,2,3,4-tetrahydronaphthalen-1-amine&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; &amp;quot;C17H17NCl2&amp;quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Limitations&lt;/h4&gt;

&lt;p&gt;You may run across classes of structures that are not recognized by Ruby CDK. This is due to limitations of the underlying &lt;a href="http://depth-first.com/articles/tag/opsin"&gt;OPSIN library&lt;/a&gt;. For example, OPSIN does not yet recognize fused heterocycle names such as 'imidazo[2,1-b][1,3]thiazole'.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Ruby CDK makes short work of converting IUPAC names into molecular formulas. This is just one example of the kind of conversion that's possible. For example, &lt;a href="http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0"&gt;a recent article&lt;/a&gt; discussed the conversion of IUPAC names to color 2-D structures.&lt;/p&gt;

&lt;p&gt;Due to Ruby's position as both a highly functional scripting language and as the foundation for the popular Web application framework &lt;a href="http://www.rubyonrails.org/"&gt;Ruby on Rails&lt;/a&gt;, a variety of IUPAC nomenclature translation applications are just a few lines of code away.&lt;/p&gt;</description>
      <pubDate>Tue, 13 Mar 2007 10:25:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:6529cee0-0821-45b1-865a-267a3254d85a</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/13/from-iupac-name-to-molecular-formula-with-ruby-cdk</link>
      <category>Tools</category>
      <category>rubycdk</category>
      <category>rcdk</category>
      <category>iupac</category>
      <category>formula</category>
    </item>
  </channel>
</rss>
