<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel</title>
    <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.sf.net"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;SMILES and InChI are the two most widely-used &lt;a href="http://depth-first.com/articles/2007/03/14/eleven-qualities-of-the-perfect-line-notation-for-the-web"&gt;line notations&lt;/a&gt; in cheminformatics. Not surprisingly, there are many situations in which it's useful to interconvert the two. This article shows a simple method for doing so using &lt;a href="http://depth-first.com/articles/tag/rubyopenbabel"&gt;Ruby Open Babel&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Parsing InChIs&lt;/h4&gt;

&lt;p&gt;Version 1.01 of the IUPAC/NIST C InChI toolkit introduced the ability to parse InChIs. This capability has subsequently been incorporated into &lt;a href="http://openbabel.sf.net"&gt;Open Babel&lt;/a&gt;, and by extension, Ruby Open Babel. It's this capability that we'll take advantage of.&lt;/p&gt;

&lt;h4&gt;A Simple Library&lt;/h4&gt;

&lt;p&gt;The following library provides everything we need to convert between SMILES and InChI via Ruby:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;openbabel&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;InChI&lt;/span&gt;
  &lt;span class="attribute"&gt;@@to_smiles&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBConversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="attribute"&gt;@@to_inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBConversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="attribute"&gt;@@to_smiles&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;set_in_and_out_formats&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;inchi&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;smi&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="attribute"&gt;@@to_inchi&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;set_in_and_out_formats&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;smi&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;inchi&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;inchi_to_smiles&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBMol&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="attribute"&gt;@@to_smiles&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read_string&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;or&lt;/span&gt; &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Can't parse InChI: &lt;span class="expr"&gt;#{inchi}&lt;/span&gt;.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="attribute"&gt;@@to_smiles&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write_string&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;strip&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;smiles_to_inchi&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBMol&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="attribute"&gt;@@to_inchi&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read_string&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;or&lt;/span&gt; &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Can't parse SMILES &lt;span class="expr"&gt;#{smiles}&lt;/span&gt;.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="attribute"&gt;@@to_inchi&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write_string&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;strip&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;After saving the above code to a file named &lt;strong&gt;inchi.rb&lt;/strong&gt;, we can interactively convert SMILES and InChIs:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'inchi'
=&gt; true
irb(main):002:0&gt; include InChI
=&gt; Object
irb(main):003:0&gt; smiles = inchi_to_smiles "InChI=1/C14H12/c1-3-7-13(8-4-1)11-12-14-9-5-2-6-10-14/h1-12H/b12-11-"
=&gt; "c1ccc(cc1)C(/[H])=C(/[H])c1ccccc1"
irb(main):004:0&gt; inchi = smiles_to_inchi smiles
=&gt; "InChI=1/C14H12/c1-3-7-13(8-4-1)11-12-14-9-5-2-6-10-14/h1-12H/b12-11-"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;In the above test, the InChI for &lt;em&gt;cis&lt;/em&gt;-stilbene is converted into a SMILES string which is then converted back to InChI form with complete fidelity, including alkene geometry. Note that this would not have been possible using the approach that was &lt;a href="http://depth-first.com/articles/2006/09/19/decoding-inchis-with-rino"&gt;previously discussed&lt;/a&gt; in which molfiles were used as intermediate datastructures.&lt;/p&gt;

&lt;p&gt;What about chiral centers? Here the results are mixed. For example, when the round-trip conversion is applied to propranalol (&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=21138"&gt;PubChem&lt;/a&gt;, &lt;a href="http://60minutes.yahoo.com/segment/21/memory_drug"&gt;Video&lt;/a&gt;), the configuration of the stereocenter is &lt;em&gt;inverted&lt;/em&gt;.&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'inchi'
=&gt; true
irb(main):002:0&gt; include InChI
=&gt; Object
irb(main):003:0&gt; smiles = inchi_to_smiles "InChI=1/C16H21NO2/c1-12(2)17-10-14(18)11-19-16-9-5-7-13-6-3-4-8-15(13)16/h3-9,12,14,17-18H,10-11H2,1-2H3/t14-/m1/s1"
=&gt; "CC(C)NC[C@@H](COc1cccc2ccccc12)O"
irb(main):004:0&gt; inchi = smiles_to_inchi smiles
=&gt; "InChI=1/C16H21NO2/c1-12(2)17-10-14(18)11-19-16-9-5-7-13-6-3-4-8-15(13)16/h3-9,12,14,17-18H,10-11H2,1-2H3/t14-/m0/s1"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;However, the same round-trip conversion of phenethanol works without inversion of stereochemistry:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'inchi'
=&gt; true
irb(main):002:0&gt; include InChI
=&gt; Object
irb(main):003:0&gt; smiles = inchi_to_smiles " InChI=1/C8H10O/c1-7(9)8-5-3-2-4-6-8/h2-7,9H,1H3/t7-/m0/s1"
=&gt; "C[C@@H](c1ccccc1)O"
irb(main):004:0&gt; inchi = smiles_to_inchi smiles
=&gt; "InChI=1/C8H10O/c1-7(9)8-5-3-2-4-6-8/h2-7,9H,1H3/t7-/m0/s1"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The most likely explanation is that under certain conditions, Open Babel incorrectly interprets and/or writes stereo parities.&lt;/p&gt;

&lt;h4&gt;One More Gotcha&lt;/h4&gt;

&lt;p&gt;On my system (Linux Mandriva 2007.1), attempting to perform the round-trip test on glucose resulted (reproducibly) in a segfault:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'inchi'
=&gt; true
irb(main):002:0&gt; include InChI
=&gt; Object
irb(main):003:0&gt; smiles = inchi_to_smiles "InChI=1/C6H12O6/c7-1-2-3(8)4(9)5(10)6(11)12-2/h2-11H,1H2/t2-,3-,4+,5-,6?/m1/s1"
=&gt; "C([C@H]1[C@H]([C@@H]([C@H](C(O)O1)O)O)O)O"
irb(main):004:0&gt; inchi = smiles_to_inchi smiles
./inchi.rb:20: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i686-linux]

Aborted
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The same segfault was obtained when using the &lt;tt&gt;babel&lt;/tt&gt; command-line utility:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ babel -ismi -oinchi
C([C@H]1[C@H]([C@@H]([C@H](C(O)O1)O)O)O)O
[Return]
Segmentation fault
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;As you can see, Ruby Open Babel makes short work of interconverting SMILES and InChIs. Despite problems with stereochemical configuration and segfaults on reading certain SMILES strings, the approach outlined here offers a quick and economical way to interconvert a variety of SMILES and InChIs.&lt;/p&gt;</description>
      <pubDate>Mon, 25 Jun 2007 08:45:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:08b043b0-d9c9-4de9-bc51-c20b4f94c306</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel</link>
      <category>Tools</category>
      <category>inchi</category>
      <category>smiles</category>
      <category>rubyopenbabel</category>
    </item>
    <item>
      <title>"Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel" by Rich Apodaca</title>
      <description>&lt;p&gt;Andrew Dalke's &lt;a href="http://sourceforge.net/mailarchive/forum.php?thread_name=4B953E3F-7FD7-429A-ADC5-58D4E23FB43D%40dalkescientific.com&amp;amp;forum_name=inchi-discuss" rel="nofollow"&gt;recent warning&lt;/a&gt; about parsing unfiltered InChIs in a production environment is also worth reading.&lt;/p&gt;</description>
      <pubDate>Fri, 29 Jun 2007 09:11:30 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:e44e4836-5641-42a1-953b-f15e912079c4</guid>
      <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel#comment-83</link>
    </item>
    <item>
      <title>"Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel" by baoilleach</title>
      <description>&lt;p&gt;Regarding SMILES parsers, check out the recent article by Andrew Dalke (whom I also met at Sheffield) at:
&lt;a href="http://www.dalkescientific.com/writings/diary/archive/2007/06/25/smiles_states.html" rel="nofollow"&gt;http://www.dalkescientific.com/writings/diary/archive/2007/06/25/smiles_states.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I've downloaded the remediated PDB (a somewhat easier test set, I suspect, than the raw PDB, but we have to start somewhere), and will attempt to read in all of the files with pybel over the weekend. Converting with babel gives lots of error messages, but no problems converting...do we want to follow these up at some point?&lt;/p&gt;

&lt;p&gt;Regarding 'industrial strength'...coming from a scripting background, I am a big believer in regression and unit tests, and I think they are the only way to ensure a rock solid parser. Any code submitted that breaks a test should just be reverted. This way incremental improvements are guaranteed.&lt;/p&gt;</description>
      <pubDate>Fri, 29 Jun 2007 03:40:57 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:8d20f7f8-27d6-435f-b6d2-855c1be899c9</guid>
      <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel#comment-82</link>
    </item>
    <item>
      <title>"Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel" by Geoff</title>
      <description>&lt;p&gt;The biggest limiting factor in my testing recently has been in my use of a laptop with a small, slow hard drive. I simply don't have the disk space to keep PubChem or ZINC or the PDB around.&lt;/p&gt;

&lt;p&gt;That will change in a few months...&lt;/p&gt;

&lt;p&gt;But I think SMILES support in Open Babel is pretty robust -- it powers eMolecules, with somewhere north of 10 million molecules. Craig James reported all sorts of SMILES and SMARTS errors.&lt;/p&gt;

&lt;p&gt;If either (or both) of you would like to try on ZINC or PubChem or PDB, I suspect we'll uncover more lurking bugs. We're getting closer to "industrial strength" though.&lt;/p&gt;</description>
      <pubDate>Thu, 28 Jun 2007 18:06:59 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:b946d299-4fcb-436a-9d45-d2552345933d</guid>
      <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel#comment-81</link>
    </item>
    <item>
      <title>"Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel" by baoilleach</title>
      <description>&lt;p&gt;Similarly, we can run OB on every PDB in the PDB. At least we can find out if anything breaks the parser...&lt;/p&gt;</description>
      <pubDate>Wed, 27 Jun 2007 12:50:01 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:ad819f07-2816-41be-8a49-19e79a3fcc94</guid>
      <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel#comment-78</link>
    </item>
    <item>
      <title>"Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel" by Rich Apodaca</title>
      <description>&lt;p&gt;Noel,&lt;/p&gt;

&lt;p&gt;I've been thinking along exactly the same lines myself...&lt;/p&gt;</description>
      <pubDate>Mon, 25 Jun 2007 23:22:15 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:b2ffb53d-d73c-4aad-aa6a-852e056824b0</guid>
      <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel#comment-76</link>
    </item>
    <item>
      <title>"Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel" by baoilleach</title>
      <description>&lt;p&gt;Would running this code on a dataset of 3D structures yield some useful bug reports? I think that if we could finally nail SMILES support, this would be a good thing. Maybe once Geoff and co. fix this problem, you could run the code on PubChem or ZINC.&lt;/p&gt;</description>
      <pubDate>Mon, 25 Jun 2007 10:43:45 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:944b83fb-af99-43ad-9e97-65a55db03667</guid>
      <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel#comment-75</link>
    </item>
    <item>
      <title>"Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel" by Geoff</title>
      <description>&lt;p&gt;No, unfortunately the stereo issue is present in the 2.1 branch and is a new bug. We'll see what we can do ASAP.&lt;/p&gt;</description>
      <pubDate>Mon, 25 Jun 2007 09:48:20 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:7cdce3cb-d09e-464a-9854-eb48d30d4f5a</guid>
      <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel#comment-74</link>
    </item>
    <item>
      <title>"Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel" by Geoff</title>
      <description>&lt;p&gt;The crash is a known bug with some SMILES -&gt; InChI conversions with 2.1.0 and is fixed in the SVN trunk and branch for 2.1.1.&lt;/p&gt;

&lt;p&gt;I'll take a look at the stereo issue -- I think that may also be fixed in the latest code too. &lt;/p&gt;</description>
      <pubDate>Mon, 25 Jun 2007 09:33:13 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:477570e6-e356-41f9-93af-ee7551f3968e</guid>
      <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel#comment-73</link>
    </item>
  </channel>
</rss>
