<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Creating Canonical SMILES with Ruby Open Babel</title>
    <link>http://depth-first.com/articles/2007/04/03/creating-canonical-smiles-with-ruby-open-babel</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Creating Canonical SMILES with Ruby Open Babel</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.sf.net"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right" border="0"&gt;&lt;/a&gt;&lt;/img&gt;Unlike many data types, molecular structure representations are not normally unique. Each numbering system you choose for the atoms and bonds of a molecule gives rise to completely accurate, but degenerate molecular representations. This is one of the fundamental &lt;a href="http://depth-first.com/articles/2006/09/03/peculiarities-of-chemical-information"&gt;peculiarities of chemical information&lt;/a&gt; - and the focus of much research activity &lt;a href="http://depth-first.com/articles/2007/03/14/eleven-qualities-of-the-perfect-line-notation-for-the-web"&gt;over the last sixty or so years&lt;/a&gt;. One of the most widely-used approaches to this problem is canonicalization.&lt;/p&gt;

&lt;p&gt;This article discusses the &lt;a href="http://sourceforge.net/forum/forum.php?forum_id=629764"&gt;SMILES canonicalization capability&lt;/a&gt; in the upcoming Open Babel 2.1 release. Among several other enhancements, this release will also feature a brand new Ruby interface. By way of preview, this article will demonstrate just how convenient it has now become to generate canonical SMILES strings with Ruby.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070403/aminopterin.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Consider the putative rodenticide aminopterin, the structure of which is shown above. Regardless of whether it turns out to be the culprit in the &lt;a href="http://www.cbsnews.com/stories/2007/03/23/national/main2600615.shtml"&gt;recent pet food poisoning case&lt;/a&gt;, it's a relatively complex molecule. And with this complexity comes many possible representations. Here's one of just hundreds, if not thousands, of possible SMILES strings for this molecule:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
Nc3nc(N)c2nc(CNc1ccc(C(=O)N[C@@H](CCC(=O)O)C(=O)O)cc1)cnc2n3
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;If you were developing a database of molecules and needed to support exact structure searching, how would you do it? One way would be to convert a query molecule to a canonical SMILES string, and then simply look for that string in an index of your database's canonical SMILES. This is useful because it allows us to convert a chemistry-specific problem (exact structure search) into a generic computer science problem (text matching).&lt;/p&gt;

&lt;p&gt;We can create a simple Ruby library to convert any SMILES string into an Open Babel canonical SMILES string:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;openbabel&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Can&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@conversion&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBConversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@conversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;set_in_and_out_formats&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;smi&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;can&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;convert&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBMol&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="attribute"&gt;@conversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read_string&lt;/span&gt; &lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="attribute"&gt;@conversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write_string&lt;/span&gt; &lt;span class="ident"&gt;mol&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

Save this code as a file called &lt;strong&gt;can.rb&lt;/strong&gt; in your working directory. The library can then be used, for example, via interactive ruby (irb):

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'can'
=&gt; true
irb(main):002:0&gt; c=Can.new
=&gt; #&lt;Can:0x2ac6cc653228 @conversion=#&lt;OpenBabel::Conversion:0x2ac6cc6531d8&gt;&gt;
irb(main):003:0&gt; puts c.convert('Nc3nc(N)c2nc(CNc1ccc(C(=O)N[C@@H](CCC(=O)O)C(=O)O)cc1)cnc2n3')
OC(=O)CC[C@@H](NC(=O)c1ccc(NCc2cnc3nc(N)nc(N)c3n2)cc1)C(=O)O
=&gt; nil
irb(main):004:0&gt; puts c.convert('C1=CC(=CC=C1C(=O)N[C@@H](CCC(=O)O)C(=O)O)NCC2=CN=C3C(=N2)C(=NC(=N3)N)N')
OC(=O)CC[C@@H](NC(=O)c1ccc(NCc2cnc3nc(N)nc(N)c3n2)cc1)C(=O)O
=&gt; nil
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As you can see, both SMILES strings for aminopterin were converted into the same canonical SMILES string.&lt;/p&gt;

&lt;p&gt;Unlike InChI, which uses a "standard" &lt;a href="http://depth-first.com/articles/2006/08/12/inchi-canonicalization-algorithm"&gt;canonicalization algorithm&lt;/a&gt;, SMILES canonicalization varies by software package. As a result, the SMILES canonicalization described here will be most useful &lt;em&gt;within&lt;/em&gt; a software package, but probably not &lt;em&gt;externally&lt;/em&gt; to it, at least initially.&lt;/p&gt;

&lt;p&gt;Ruby is still an upstart language in cheminformatics. But tools like &lt;a href="http://depth-first.com/articles/tag/rubycdk"&gt;Ruby CDK&lt;/a&gt; and Ruby Open Babel offer ample opportunities for learning what this remarkable language can do for the development of chemistry applications.&lt;/p&gt;</description>
      <pubDate>Tue, 03 Apr 2007 11:59:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:53ca2aed-221a-4d52-bbb9-324fedce78d8</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/04/03/creating-canonical-smiles-with-ruby-open-babel</link>
      <category>Tools</category>
      <category>openbabel</category>
      <category>ruby</category>
      <category>rubyopenbabel</category>
      <category>smiles</category>
      <category>canonicalization</category>
    </item>
  </channel>
</rss>
