<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Hacking PubChem with Ruby</title>
    <link>http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Hacking PubChem with Ruby</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif" align="right"&gt;&lt;/img&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; is an increasingly popular, free-access, online molecular database operated by the National Institutes of Health. Web services are a hot topic, with sites such as &lt;a href="http://www.flickr.com/services/api/"&gt;Flickr&lt;/a&gt;, &lt;a href="http://www.google.com/apis/"&gt;Google&lt;/a&gt;, and &lt;a href="http://developer.ebay.com/common/api/"&gt;eBay&lt;/a&gt; offering developers the tools to build rich content through "mashups" of several web APIs. Although there is no formal PubChem API, it's possible to roll your own. As a demonstration, this article will show how structural information can be retrieved from PubChem using some simple Ruby code. The inspiration for this article came from the &lt;tt&gt;PubChem&lt;/tt&gt; module that is part of &lt;a href="http://chemruby.org"&gt;Chemruby&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The only thing you'll need for this tutorial is Ruby, preferably version 1.8.2 or higher. Create a directory called &lt;strong&gt;pubchem&lt;/strong&gt; and make it your working directory. Then create a file called &lt;strong&gt;pubchem.rb&lt;/strong&gt; containing the following code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net/http&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="comment"&gt;# A very simple PubChem Web API.&lt;/span&gt;
&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;PubChem&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns a molfile (as a String) for the molecule with PubChem&lt;/span&gt;
  &lt;span class="comment"&gt;# CID matching compound_id.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.get_molfile&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;compound_id&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;molfile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt;
    &lt;span class="ident"&gt;path&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;/summary/summary.cgi?cid=&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;compound_id&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&amp;amp;disopt=DisplaySDF&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;pubchem.ncbi.nlm.nih.gov&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;path&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="ident"&gt;molfile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;response&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;molfile&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Writes a PNG image, for the molecule with PubChem&lt;/span&gt;
  &lt;span class="comment"&gt;# CID matching compound_id, to the file specified by filename.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.write_image&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;compound_id&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;path&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;/image/imgsrv.fcgi?t=l&amp;amp;cid=&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;compound_id&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;pubchem.ncbi.nlm.nih.gov&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;path&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="ident"&gt;image&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;response&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;

      &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;open&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;w&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;file&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
        &lt;span class="ident"&gt;file&lt;/span&gt; &lt;span class="punct"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="ident"&gt;image&lt;/span&gt;
      &lt;span class="keyword"&gt;end&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

PubChem references each of its compounds by a unique integer identifier, the PubChem CID. This is very handy because retrieving PubChem resources is as simple as encoding a URL containing the CID of interest. The class above illustrates how this system can be used to get a molfile and write a PNG image using just a few lines of Ruby.

Using the &lt;tt&gt;PubChem&lt;/tt&gt; class is simplicity itself. To get the molfile for Levonorgestrel (&lt;a href="http://www.go2planb.com/ForConsumers/Index.aspx"&gt;Plan B&lt;/a&gt;), which has the CID 13109:

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;pubchem&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;molfile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;PubChem&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;get_molfile&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;13109&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt; &lt;span class="comment"&gt;#=&amp;gt; returns the molfile for Levonorgestrel as a String&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

To write the 2-D structure diagram of Levonorgestrel as a PNG:

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;pubchem&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;PubChem&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;write_png&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;13109&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;image.png&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt; &lt;span class="comment"&gt;#=&amp;gt; writes a PNG image of Levonorgestrel&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

This code saves the image below to your working directory as &lt;strong&gt;image.png&lt;/strong&gt;.

&lt;center&gt;&lt;img src="http://depth-first.com/files/levonorgestrel.png"&gt;&lt;/img&gt;&lt;/center&gt;

The above two code fragments can either be saved as a file and executed by the Ruby interpreter:

&lt;div class="console"&gt;
&lt;pre&gt;
$ ruby filename.rb
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;or it they be entered interactively in your console with &lt;a href="http://tryruby.hobix.com/"&gt;irb&lt;/a&gt;:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt;  
&lt;/pre&gt;   
&lt;/div&gt;

&lt;p&gt;As you can see, there's not much to building a PubChem API in Ruby. The same principles discussed here should apply in any programming language. Future articles in this series will show how to build more complex PubChem APIs and integrate them with other software packages and web services.&lt;/p&gt;</description>
      <pubDate>Wed, 30 Aug 2006 02:29:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:6cc9c1f4-db5b-4a86-96f1-9c9081a71b5d</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby</link>
      <category>Databases</category>
      <category>pubchem</category>
      <category>ruby</category>
      <category>mashup</category>
      <category>api</category>
    </item>
  </channel>
</rss>
