<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Hacking PubChem: Visually Inspect Results for CAS Number and Keyword Searches</title>
    <link>http://depth-first.com/articles/2007/09/25/hacking-pubchem-visually-inspect-results-for-cas-number-and-keyword-searches</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Hacking PubChem: Visually Inspect Results for CAS Number and Keyword Searches</title>
      <description>&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;A recent article described how PubChem could be used to &lt;a href="http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby"&gt;quickly search for CAS numbers&lt;/a&gt;. Although useful, the approach is limited in that only an array of PubChem CIDs was returned. What would be really useful would be a simple way to create a report with entries hyperlinked into the PubChem site itself to aid in visual inspection. In this tutorial, we'll see how an HTML template and a few extra lines of code can do just that.&lt;/p&gt;

&lt;h4&gt;The Template&lt;/h4&gt;

&lt;p&gt;Ruby supports a number of HTML templating mechanisms. In this example, we'll use an ERB template resurrected from the &lt;a href="http://depth-first.com/articles/2006/12/11/hacking-molbank-creating-a-graphical-table-of-contents"&gt;Molbank graphical table of contents&lt;/a&gt; tutorial:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;html&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;head&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;title&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;PubChem Search for #{term}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;title&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;head&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;h1&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Search: #{term}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;h1&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;table&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;0&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;cids.each&lt;/span&gt; &lt;span class="attribute"&gt;do&lt;/span&gt; |&lt;span class="attribute"&gt;cid|&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;td&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;image&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?cid=#{cid}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;summary&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=#{cid}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt; &lt;span class="attribute"&gt;href&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&amp;lt;%= summary %&amp;gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;img&lt;/span&gt; &lt;span class="attribute"&gt;src&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&amp;lt;%= image %&amp;gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;border&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;img&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;center&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt; &lt;span class="attribute"&gt;style&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;font-size: 8px&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt; &lt;span class="attribute"&gt;href&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&amp;lt;%= summary %&amp;gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&amp;lt;%=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;CID-#{cid}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;center&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;td&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; +&lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;1&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;if&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; &lt;span class="punct"&gt;&amp;gt;&lt;/span&gt; 5 %&amp;gt;
          &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;0&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;end&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt;&lt;span class="attribute"&gt;end&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;table&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;html&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above template uses a search term and an array of CIDs to build a table of results. Each cell in the table contains a color 2D image and the CID, both hyperlinked into PubChem itself.&lt;/p&gt;

&lt;p&gt;Saving this library to a file called &lt;strong&gt;template.rhtml&lt;/strong&gt; is all we need to do.&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;The library is a modification of the one shown in &lt;a href="http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby"&gt;the previous article&lt;/a&gt; in this series:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mechanize&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;erb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;PubChemTerms&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;report&lt;/span&gt; &lt;span class="ident"&gt;term&lt;/span&gt;
    &lt;span class="ident"&gt;cids&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;get_cids&lt;/span&gt; &lt;span class="ident"&gt;term&lt;/span&gt;
    &lt;span class="ident"&gt;erb&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;ERB&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;IO&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;template.rhtml&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;))&lt;/span&gt;

    &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;open&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;output.html&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;w+&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;file&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;file&lt;/span&gt; &lt;span class="punct"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="ident"&gt;erb&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;result&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;binding&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;get_cids&lt;/span&gt; &lt;span class="ident"&gt;term&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&amp;amp;retmax=100&amp;amp;term=&lt;span class="expr"&gt;#{term}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

    &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parser&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;collect&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;innerHTML&lt;/span&gt;&lt;span class="punct"&gt;}&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The method &lt;tt&gt;report&lt;/tt&gt; accepts a search term and uses our template to render a report.&lt;/p&gt;

&lt;h4&gt;Testing&lt;/h4&gt;

&lt;p&gt;By saving the above library in a file called &lt;strong&gt;pubchem.rb&lt;/strong&gt;, we can search by keyword via interactive ruby (irb):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'pubchem'
=&gt; true
irb(main):002:0&gt; include PubChemTerms
=&gt; Object
irb(main):003:0&gt; report 'esomeprazole'
=&gt; #&lt;File:output.html (closed)&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This produces a file called &lt;strong&gt;output.html&lt;/strong&gt; that can be viewed with any browser:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070925/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;As in the original version of the library, we can also query by CAS number:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'pubchem'
=&gt; true
irb(main):002:0&gt; include PubChemTerms
=&gt; Object
irb(main):003:0&gt; report '119141-88-7'
=&gt; #&lt;File:output.html (closed)&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;The simple approach outlined here could be extended in many ways. For example, we could easily retrieve molfiles based on keyword or CAS number search. We could pipe queries together or work with query lists. We could &lt;a href="http://depth-first.com/articles/2007/09/17/hacking-chemspider-query-by-smiles-and-inchi-with-ruby"&gt;blend in ChemSpider data&lt;/a&gt;. We could even build a simple Web application (with &lt;a href="http://rubyonrails.org"&gt;Rails&lt;/a&gt;) that returned customized reports. Mixing in &lt;a href="http://depth-first.com/articles/tag/rcdk"&gt;Ruby CDK&lt;/a&gt; or &lt;a href="http://depth-first.com/articles/tag/rubyopenbabel"&gt;Ruby Open Babel&lt;/a&gt; offers still more possibilities.&lt;/p&gt;

&lt;p&gt;Increasingly, the most important question in cheminformatics is not "What can we build?", but rather "What should we build?" Success in this new world requires a much deeper understanding of how cheminformatics software is being used by real chemists and where it's not.&lt;/p&gt;</description>
      <pubDate>Tue, 25 Sep 2007 10:55:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:5b1ef92b-4ed3-443e-a683-dc37d23c4352</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/25/hacking-pubchem-visually-inspect-results-for-cas-number-and-keyword-searches</link>
      <category>Tools</category>
      <category>pubchem</category>
      <category>casnumber</category>
      <category>cas</category>
      <category>ruby</category>
      <category>keyword</category>
      <category>erb</category>
      <category>html</category>
      <category>entrez</category>
    </item>
    <item>
      <title>"Hacking PubChem: Visually Inspect Results for CAS Number and Keyword Searches" by Felipe Schneider</title>
      <description>&lt;p&gt;Amazing blog, and well-done article. Really helpful!&lt;/p&gt;</description>
      <pubDate>Thu, 20 Dec 2007 20:08:32 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:00602a39-8fac-48e3-b331-5dea453f3358</guid>
      <link>http://depth-first.com/articles/2007/09/25/hacking-pubchem-visually-inspect-results-for-cas-number-and-keyword-searches#comment-314</link>
    </item>
  </channel>
</rss>
