<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag hackingchemspider</title>
    <link>http://depth-first.com/articles/tag/hackingchemspider</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Hacking ChemSpider: Query by SMILES and InChI with Ruby</title>
      <description>&lt;p&gt;&lt;a href="http://chemspider.com"&gt;&lt;img src="http://depth-first.com/demo/20070917/chemspider.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Slowly but surely, cheminformatics Web APIs are starting to appear. What's the big deal, you may ask? By exposing Web APIs, service providers enable third parties to develop new applications that &lt;a href="http://depth-first.com/articles/2006/09/23/mashups-for-fun-and-profit"&gt;"mash up"&lt;/a&gt; functionality from two or more sites, or which take the original service in directions its founders never considered.&lt;/p&gt;

&lt;p&gt;By way of &lt;a href="http://www.chemspider.com/blog"&gt;Antony Williams' blog&lt;/a&gt;, I came across &lt;a href="http://www.chemspider.com/blog/?p=135"&gt;the announcement&lt;/a&gt; for the &lt;a href="http://www.chemspider.com/inchi.asmx"&gt;ChemSpider Web API&lt;/a&gt;. What can this API do for Web developers? To find out, let's write a small Ruby library.&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;Our library will accept a SMILES string or InChI identifier and returns a URL pointing to the corresponding ChemSpider compound summary page. Like &lt;a href="http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby"&gt;previous Web API demos&lt;/a&gt;, this one uses the powerful Ruby library &lt;a href="http://mechanize.rubyforge.org/"&gt;Mechanize&lt;/a&gt;, leading to very concise code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mechanize&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;ChemSpider&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;url_for_inchi&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.chemspider.com/inchi.asmx/InChIToCSID?inchi=&lt;span class="expr"&gt;#{inchi}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="ident"&gt;csid&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;Hpricot&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;string&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;innerHTML&lt;/span&gt;

    &lt;span class="ident"&gt;csid&lt;/span&gt; &lt;span class="punct"&gt;==&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;?&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt; &lt;span class="punct"&gt;:&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.chemspider.com/RecordView.aspx?id=&lt;span class="expr"&gt;#{csid}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;url_for_smiles&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.chemspider.com/inchi.asmx/SMILESToInChI?smiles=&lt;span class="expr"&gt;#{smiles}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;Hpricot&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;string&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;innerHTML&lt;/span&gt;

    &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Invalid SMILES: &lt;span class="expr"&gt;#{smiles}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;==&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

    &lt;span class="ident"&gt;url_for_inchi&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;url_for_inchi&lt;/tt&gt; method directly uses the ChemSpider API to query by InChI. The &lt;tt&gt;url_for_smiles&lt;/tt&gt; method first uses the ChemSpider API to convert a SMILES string to an InChI identifier, and then calls the &lt;tt&gt;url_for_inchi&lt;/tt&gt; method.&lt;/p&gt;

&lt;p&gt;Two points are worth noting. First, although for convenience the InChI identifier isn't &lt;a href="http://www.aptana.com/docs/index.php/URL_Escape_Codes"&gt;escaped&lt;/a&gt; before being appended to the API URL, strictly speaking it should be. Second, both methods invoke the underlying Mechanize library &lt;a href="http://code.whytheluckystiff.net/hpricot/"&gt;Hpricot&lt;/a&gt; to parse the raw XML returned by ChemSpider.&lt;/p&gt;

&lt;h4&gt;Testing&lt;/h4&gt;

&lt;p&gt;Saving the above code to a file called &lt;strong&gt;chemspider.rb&lt;/strong&gt;, we can get the URL to ChemSpider's benzene page from its InChI identifier via interactive Ruby (irb):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'chemspider'
=&gt; true
irb(main):002:0&gt; include ChemSpider
=&gt; Object
irb(main):003:0&gt; url_for_inchi "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"
=&gt; "http://www.chemspider.com/RecordView.aspx?id=236"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We can work with SMILES strings just as easily as with InChIs:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'chemspider'
=&gt; true
irb(main):002:0&gt; include ChemSpider
=&gt; Object
irb(main):003:0&gt; url_for_smiles 'c1ccccc1'
=&gt; "http://www.chemspider.com/RecordView.aspx?id=236"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Both the InChI and the SMILES string yield a URL pointing to the &lt;a href="http://www.chemspider.com/RecordView.aspx?id=236"&gt;same Chemspider page for benzene&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Like most &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;chemical databases&lt;/a&gt;, ChemSpider uses a compound summary page as a way of organizing the available resources for a given molecule. With a method in hand for accessing these pages based on arbitrary SMILES or InChIs, we can begin to think of manipulating ChemSpider independently of its current user interface. But that's a story for another time.&lt;/p&gt;</description>
      <pubDate>Mon, 17 Sep 2007 08:19:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:9e6b90f7-590d-47d4-b2a8-bbac5a014c74</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/17/hacking-chemspider-query-by-smiles-and-inchi-with-ruby</link>
      <category>Tools</category>
      <category>chemspider</category>
      <category>hackingchemspider</category>
      <category>ruby</category>
      <category>webapi</category>
      <category>mashup</category>
      <category>mechanize</category>
      <category>hpricot</category>
    </item>
  </channel>
</rss>
