<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag keyword</title>
    <link>http://depth-first.com/articles/tag/keyword</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Hacking PubChem: Visually Inspect Results for CAS Number and Keyword Searches</title>
      <description>&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;A recent article described how PubChem could be used to &lt;a href="http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby"&gt;quickly search for CAS numbers&lt;/a&gt;. Although useful, the approach is limited in that only an array of PubChem CIDs was returned. What would be really useful would be a simple way to create a report with entries hyperlinked into the PubChem site itself to aid in visual inspection. In this tutorial, we'll see how an HTML template and a few extra lines of code can do just that.&lt;/p&gt;

&lt;h4&gt;The Template&lt;/h4&gt;

&lt;p&gt;Ruby supports a number of HTML templating mechanisms. In this example, we'll use an ERB template resurrected from the &lt;a href="http://depth-first.com/articles/2006/12/11/hacking-molbank-creating-a-graphical-table-of-contents"&gt;Molbank graphical table of contents&lt;/a&gt; tutorial:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;html&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;head&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;title&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;PubChem Search for #{term}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;title&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;head&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;h1&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Search: #{term}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;h1&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;table&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;0&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;cids.each&lt;/span&gt; &lt;span class="attribute"&gt;do&lt;/span&gt; |&lt;span class="attribute"&gt;cid|&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;td&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;image&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?cid=#{cid}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;summary&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=#{cid}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt; &lt;span class="attribute"&gt;href&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&amp;lt;%= summary %&amp;gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;img&lt;/span&gt; &lt;span class="attribute"&gt;src&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&amp;lt;%= image %&amp;gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;border&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;img&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;center&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt; &lt;span class="attribute"&gt;style&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;font-size: 8px&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt; &lt;span class="attribute"&gt;href&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&amp;lt;%= summary %&amp;gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&amp;lt;%=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;CID-#{cid}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;center&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;td&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; +&lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;1&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;if&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; &lt;span class="punct"&gt;&amp;gt;&lt;/span&gt; 5 %&amp;gt;
          &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;0&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;end&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt;&lt;span class="attribute"&gt;end&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;table&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;html&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above template uses a search term and an array of CIDs to build a table of results. Each cell in the table contains a color 2D image and the CID, both hyperlinked into PubChem itself.&lt;/p&gt;

&lt;p&gt;Saving this library to a file called &lt;strong&gt;template.rhtml&lt;/strong&gt; is all we need to do.&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;The library is a modification of the one shown in &lt;a href="http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby"&gt;the previous article&lt;/a&gt; in this series:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mechanize&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;erb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;PubChemTerms&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;report&lt;/span&gt; &lt;span class="ident"&gt;term&lt;/span&gt;
    &lt;span class="ident"&gt;cids&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;get_cids&lt;/span&gt; &lt;span class="ident"&gt;term&lt;/span&gt;
    &lt;span class="ident"&gt;erb&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;ERB&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;IO&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;template.rhtml&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;))&lt;/span&gt;

    &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;open&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;output.html&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;w+&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;file&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;file&lt;/span&gt; &lt;span class="punct"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="ident"&gt;erb&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;result&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;binding&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;get_cids&lt;/span&gt; &lt;span class="ident"&gt;term&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&amp;amp;retmax=100&amp;amp;term=&lt;span class="expr"&gt;#{term}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

    &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parser&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;collect&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;innerHTML&lt;/span&gt;&lt;span class="punct"&gt;}&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The method &lt;tt&gt;report&lt;/tt&gt; accepts a search term and uses our template to render a report.&lt;/p&gt;

&lt;h4&gt;Testing&lt;/h4&gt;

&lt;p&gt;By saving the above library in a file called &lt;strong&gt;pubchem.rb&lt;/strong&gt;, we can search by keyword via interactive ruby (irb):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'pubchem'
=&gt; true
irb(main):002:0&gt; include PubChemTerms
=&gt; Object
irb(main):003:0&gt; report 'esomeprazole'
=&gt; #&lt;File:output.html (closed)&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This produces a file called &lt;strong&gt;output.html&lt;/strong&gt; that can be viewed with any browser:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070925/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;As in the original version of the library, we can also query by CAS number:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'pubchem'
=&gt; true
irb(main):002:0&gt; include PubChemTerms
=&gt; Object
irb(main):003:0&gt; report '119141-88-7'
=&gt; #&lt;File:output.html (closed)&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;The simple approach outlined here could be extended in many ways. For example, we could easily retrieve molfiles based on keyword or CAS number search. We could pipe queries together or work with query lists. We could &lt;a href="http://depth-first.com/articles/2007/09/17/hacking-chemspider-query-by-smiles-and-inchi-with-ruby"&gt;blend in ChemSpider data&lt;/a&gt;. We could even build a simple Web application (with &lt;a href="http://rubyonrails.org"&gt;Rails&lt;/a&gt;) that returned customized reports. Mixing in &lt;a href="http://depth-first.com/articles/tag/rcdk"&gt;Ruby CDK&lt;/a&gt; or &lt;a href="http://depth-first.com/articles/tag/rubyopenbabel"&gt;Ruby Open Babel&lt;/a&gt; offers still more possibilities.&lt;/p&gt;

&lt;p&gt;Increasingly, the most important question in cheminformatics is not "What can we build?", but rather "What should we build?" Success in this new world requires a much deeper understanding of how cheminformatics software is being used by real chemists and where it's not.&lt;/p&gt;</description>
      <pubDate>Tue, 25 Sep 2007 10:55:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:5b1ef92b-4ed3-443e-a683-dc37d23c4352</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/25/hacking-pubchem-visually-inspect-results-for-cas-number-and-keyword-searches</link>
      <category>Tools</category>
      <category>pubchem</category>
      <category>casnumber</category>
      <category>cas</category>
      <category>ruby</category>
      <category>keyword</category>
      <category>erb</category>
      <category>html</category>
      <category>entrez</category>
    </item>
    <item>
      <title>Hacking PubChem: Convert CAS Numbers into PubChem CIDs with Ruby</title>
      <description>&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;Although the PubChem system has been discussed in &lt;a href="http://depth-first.com/articles/tag/pubchem"&gt;numerous recent D-F articles&lt;/a&gt; and elsewhere, there's much more to the story that hasn't been told. One of the more intriguing things PubChem can do is &lt;a href="http://depth-first.com/articles/2007/05/21/simple-cas-number-lookup-with-pubchem"&gt;look up CAS Numbers for free&lt;/a&gt;. In this tutorial, we'll see how a simple Ruby script can be used to automate the conversion of CAS numbers into PubChem Compound IDs (CIDs).&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;Our library needs to accept a CAS number and return an array of PubChem CIDs in response. To do this, we'll make use of the &lt;a href="http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html"&gt;Entrez eUtils system&lt;/a&gt;. Although Entrez is incredibly complex, the only two things that matter now are that the NIH requires automated scripts to access most of its databases through Entrez, and that Entrez can be used to perform PubChem keyword queries.&lt;/p&gt;

&lt;p&gt;The library is simplicity itself:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mechanize&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;PubChemTerms&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;get_cids&lt;/span&gt; &lt;span class="ident"&gt;term&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&amp;amp;retmax=100&amp;amp;term=&lt;span class="expr"&gt;#{term}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

    &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parser&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;collect&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;innerHTML&lt;/span&gt;&lt;span class="punct"&gt;}&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The excellent Ruby library &lt;a href="http://mechanize.rubyforge.org/"&gt;Mechanize&lt;/a&gt; is used for submitting queries and processing the results. (This is the same library that was used to &lt;a href="http://depth-first.com/articles/2007/06/27/easily-convert-publisher-urls-and-dois-to-bibliographical-citations-synthesis-synlett-ruby-and-mechanize"&gt;extract full bibliographical information&lt;/a&gt; from nothing more than a DOI). The only remarkable thing about the library above is how unremarkable it is.&lt;/p&gt;

&lt;h4&gt;A Test&lt;/h4&gt;

We can test the library by saving it in a file called &lt;strong&gt;entrez.rb&lt;/strong&gt; and starting an interactive Ruby (irb) session. Opening up my copy of the Merck index to a random page and selecting an entry gives a CAS number to try (64318-79-2 - gemeprost). Plugging this CAS number into our irb session gives:

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'entrez'
=&gt; true
irb(main):002:0&gt; include PubChemTerms
=&gt; Object
irb(main):003:0&gt; get_cids '64318-79-2'
=&gt; ["5282237", "6434870"]
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Our library has returned a Ruby array containing two compound identifiers. We can use PubChem to view their records &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5282237"&gt;here&lt;/a&gt; and &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6434870"&gt;here&lt;/a&gt;. Visual inspection reveals these two compounds to be isomers of each other, with the first member of the array containing the direct hit.&lt;/p&gt;

&lt;p&gt;Let's try another CAS number selected from another random Merck index entry:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
irb(main):004:0&gt; get_cids '66981-73-5'
=&gt; ["68870", "169125"]
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Again we've obtained two CIDs, with the &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=68870"&gt;first one&lt;/a&gt; being the neutral form and the second one being the &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=169125"&gt;sodium salt&lt;/a&gt; of the antidepressant tianeptine.&lt;/p&gt;

&lt;h4&gt;Applications&lt;/h4&gt;

&lt;p&gt;Now, instead of converting one or two CAS numbers, imagine we've got a few thousand. Our library could be easily adapted to this purpose. The only caveat is that we'd need to &lt;a href="http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements"&gt;observe the Entrez use policy&lt;/a&gt; and not overload the server with too many requests. We could build in a delay with Ruby's &lt;tt&gt;sleep&lt;/tt&gt; method.&lt;/p&gt;

&lt;p&gt;Notice that the library can be used to search for any keyword - not just CAS numbers. For example:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'entrez'
=&gt; true
irb(main):002:0&gt; include PubChemTerms
=&gt; Object
irb(main):003:0&gt; get_cids 'anandamide'
=&gt; ["5281969", "5283455", "5283388", "4671", "5353407", "5283452", "5283456", "5283451", "5283450", "5283449", "5283448", "5283447", "5283445", "5283444"]
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Like our previous queries, we've obtained multiple CIDs associated with the term 'anandamide', with the first one being the direct hit.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Our little library isn't perfect, but it performs a very difficult task cheaply and conveniently in the majority of cases. By mashing up this functionality with other Ruby cheminformatics libraries (for example &lt;a href="http://depth-first.com/articles/2007/04/09/painless-installation-of-ruby-open-babel"&gt;Ruby Open Babel&lt;/a&gt; and &lt;a href="http://depth-first.com/articles/tag/rubycdk"&gt;Ruby CDK&lt;/a&gt;), a variety of tough and highly practical cheminformatics problems can be solved elegantly. Look to further installments of the Hacking PubChem series to find out how.&lt;/p&gt;</description>
      <pubDate>Thu, 13 Sep 2007 09:36:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:12e717a5-01e2-4b4c-871b-0a86c69f55e4</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby</link>
      <category>Tools</category>
      <category>pubchem</category>
      <category>hackingpubchem</category>
      <category>ruby</category>
      <category>entrez</category>
      <category>casnumber</category>
      <category>keyword</category>
    </item>
  </channel>
</rss>
