<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Hacking PubChem: Query by SMILES</title>
    <link>http://depth-first.com/articles/2006/09/21/hacking-pubchem-query-by-smiles</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Hacking PubChem: Query by SMILES</title>
      <description>&lt;p&gt;Recently, I showed how &lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;a simple PubChem API&lt;/a&gt; could be built from a few lines of Ruby code. The API we created could retrieve a molfile and a 2-D molecular rendering given a PubChem compound ID (CID). In this tutorial, we'll see how a SMILES query mechanism can be added to the API, enabling CIDs to be retrieved from any valid SMILES string. We'll also see how to extend this capability to retrieving a 2-D image from PubChem by submitting a SMILES string.&lt;/p&gt;

&lt;h4&gt;Credits&lt;/h4&gt;

&lt;p&gt;The API that follows is based on the &lt;strong&gt;pubchem.rb&lt;/strong&gt; file found in &lt;a href="http://rubyforge.org/projects/chemruby"&gt;Chemruby&lt;/a&gt; by Tadashi Kadowaki and Nobua Tanaka.&lt;/p&gt;

&lt;h4&gt;Defining the Problem&lt;/h4&gt;

&lt;p&gt;We want to create a PubChem API that returns an &lt;tt&gt;Array&lt;/tt&gt; of CIDs given any valid SMILES string. The API will communicate with the publically-available molecular database &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; using HTTP.&lt;/p&gt;

&lt;p&gt;In some cases, PubChem associates more than one CID for a given molecular structure. For example, querying the SMILES string &lt;tt&gt;c1ccccc1&lt;/tt&gt; (benzene) finds both benzene and C-14 benzene. The software needs to handle these cases as well.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;The only thing you'll need for this tutorial is Ruby, preferably v1.8 or better.&lt;/p&gt;

&lt;h4&gt;Code&lt;/h4&gt;

&lt;p&gt;Create a file called &lt;strong&gt;query.rb&lt;/strong&gt; in your working directory containing the following code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;uri&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net/http&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="comment"&gt;# A simple SMILES query for PubChem based on the file &amp;lt;tt&amp;gt;pubchem.rb&amp;lt;/tt&amp;gt;,&lt;/span&gt;
&lt;span class="comment"&gt;# and originally part of Chemruby (http://rubyforge.org/project/chemruby).&lt;/span&gt;
&lt;span class="comment"&gt;# Distributed under Ruby's License.&lt;/span&gt;
&lt;span class="comment"&gt;#&lt;/span&gt;
&lt;span class="comment"&gt;# Copyright (C) 2005, 2006 KADOWAKI Tadashi &amp;lt;kado@kuicr.kyoto-u.ac.jp&amp;gt;&lt;/span&gt;
&lt;span class="comment"&gt;#                          TANAKA   Nobuya  &amp;lt;tanaka@kuicr.kyoto-u.ac.jp&amp;gt;&lt;/span&gt;
&lt;span class="comment"&gt;#                          APODACA  Richard &amp;lt;r_apodaca@users.sf.net&amp;gt;&lt;/span&gt;
&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;PubChemQuery&lt;/span&gt;
  &lt;span class="attribute"&gt;@@host&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;pubchem.ncbi.nlm.nih.gov&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="attribute"&gt;@@searchpath&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;/search/&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="attribute"&gt;@@query&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;PreQSrv.cgi&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-----boundary-----&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

  &lt;span class="comment"&gt;# Synthetic form data. Lifted from Chemruby &amp;lt;tt&amp;gt;pubchem.rb&amp;lt;/tt&amp;gt;&lt;/span&gt;
  &lt;span class="attribute"&gt;@@data&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;[&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Content-Disposition: form-data; name=&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;mode&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;simplequery&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Content-Disposition: form-data; name=&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;queue&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;ssquery&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Content-Disposition: form-data; name=&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;simple_searchdata&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;%s&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Content-Disposition: form-data; name=&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;simple_searchtype&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;fs&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Content-Disposition: form-data; name=&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;maxhits&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;%s&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;].&lt;/span&gt;&lt;span class="ident"&gt;join&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="escape"&gt;\x0d\x0a&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns an &amp;lt;tt&amp;gt;Array&amp;lt;/tt&amp;gt; of CIDs matching &amp;lt;tt&amp;gt;smiles&amp;lt;/tt&amp;gt;. If no matches are found,&lt;/span&gt;
  &lt;span class="comment"&gt;# &amp;lt;tt&amp;gt;nil&amp;lt;/tt&amp;gt; is returned.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.query_by_smiles&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;maxhits&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;100&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;form_response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;post_form&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;maxhits&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;wait_response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;process_wait_page&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;form_response&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;url&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;get_report_url&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;wait_response&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;url&lt;/span&gt; &lt;span class="punct"&gt;?&lt;/span&gt; &lt;span class="ident"&gt;process_report&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;url&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="punct"&gt;:&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

&lt;span class="ident"&gt;private&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns the response to posting the initial search form.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.post_form&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;maxhits&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@host&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;80&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;post&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@searchpath&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="attribute"&gt;@@query&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="attribute"&gt;@@data&lt;/span&gt; &lt;span class="punct"&gt;%&lt;/span&gt; &lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;maxhits&lt;/span&gt;&lt;span class="punct"&gt;],&lt;/span&gt;
      &lt;span class="punct"&gt;{&lt;/span&gt;
        &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;Content-Type&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;multipart/form-data; boundary=&lt;span class="expr"&gt;#{@@boundary}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt;
        &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;Referer&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/search/&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
      &lt;span class="punct"&gt;}).&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;response&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Processes the wait page displayed after submission of the search form.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.process_wait_page&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="ident"&gt;m&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;/&lt;/span&gt;&lt;span class="regex"&gt;url=&amp;quot;([^&amp;quot;]+)&amp;quot;&lt;/span&gt;&lt;span class="punct"&gt;/.&lt;/span&gt;&lt;span class="ident"&gt;match&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@host&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;80&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
        &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@searchpath&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;m&lt;/span&gt;&lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="number"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;]).&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;
      &lt;span class="keyword"&gt;end&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;response&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns the URL, as a &amp;lt;tt&amp;gt;String&amp;lt;/tt&amp;gt;, to the search report, given the specified&lt;/span&gt;
  &lt;span class="comment"&gt;# body of the wait page.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.get_report_url&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;url&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@host&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;80&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="keyword"&gt;while&lt;/span&gt; &lt;span class="punct"&gt;/&lt;/span&gt;&lt;span class="regex"&gt;setTimeout&lt;span class="escape"&gt;\(&lt;/span&gt;'document.location.replace&lt;span class="escape"&gt;\(&lt;/span&gt;&amp;quot;([^&amp;quot;]+)&amp;quot;&lt;span class="escape"&gt;\)&lt;/span&gt;;', (&lt;span class="escape"&gt;\d&lt;/span&gt;+)&lt;span class="escape"&gt;\)&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;/&lt;/span&gt; &lt;span class="punct"&gt;=~&lt;/span&gt; &lt;span class="ident"&gt;body&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt;
        &lt;span class="ident"&gt;sleep&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="global"&gt;$2&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;to_f&lt;/span&gt;&lt;span class="punct"&gt;/&lt;/span&gt;&lt;span class="number"&gt;100&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

        &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;URI&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parse&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="global"&gt;$1&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;to_s&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
        &lt;span class="ident"&gt;body&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;response&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;
        &lt;span class="ident"&gt;url&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;response&lt;/span&gt;&lt;span class="punct"&gt;['&lt;/span&gt;&lt;span class="string"&gt;location&lt;/span&gt;&lt;span class="punct"&gt;']&lt;/span&gt;
      &lt;span class="keyword"&gt;end&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;url&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Extracts CIDs from the search report contained at &amp;lt;tt&amp;gt;url&amp;lt;/tt&amp;gt;.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.process_report&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;url&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;cid&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Array&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@host&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;80&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="comment"&gt;# text format&lt;/span&gt;
      &lt;span class="ident"&gt;url&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;sub!&lt;/span&gt;&lt;span class="punct"&gt;(/&lt;/span&gt;&lt;span class="regex"&gt;cmd=Select&lt;span class="escape"&gt;\+&lt;/span&gt;from&lt;span class="escape"&gt;\+&lt;/span&gt;History&lt;/span&gt;&lt;span class="punct"&gt;/,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;cmd=Text&amp;amp;dopt=Brief&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
      &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;url&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;scan&lt;/span&gt;&lt;span class="punct"&gt;(/&lt;/span&gt;&lt;span class="regex"&gt;&lt;span class="escape"&gt;\d&lt;/span&gt;+: CID: (&lt;span class="escape"&gt;\d&lt;/span&gt;+)&lt;/span&gt;&lt;span class="punct"&gt;/).&lt;/span&gt;&lt;span class="ident"&gt;each&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
        &lt;span class="ident"&gt;cid&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;push&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;])&lt;/span&gt;
      &lt;span class="keyword"&gt;end&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;cid&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You might want to &lt;a href="http://pubchem.ncbi.nlm.nih.gov/search/"&gt;manually submit a SMILES query&lt;/a&gt; to PubChem as a refresher on how this webapp works. Briefly, the contents of the SMILES search field are read, and a wait screen appears, typically for three seconds. You are then redirected to a results report page containing thumbnail images of the hits and their CIDs.&lt;/p&gt;

&lt;p&gt;The PubChemQuery class contains a single public class method, &lt;tt&gt;query_by_smiles&lt;/tt&gt;. This method builds a form to submit, based on the supplied SMILES string and optional &lt;tt&gt;maxhits&lt;/tt&gt; argument. It then waits until PubChem indicates that the query is about to finish processing. The URL for the results report page is then parsed. If a nonempty URL was found, then its page is loaded, and CIDs are scraped. Otherwise, the method returns &lt;tt&gt;nil&lt;/tt&gt;.&lt;/p&gt;

&lt;h4&gt;Usage&lt;/h4&gt;

&lt;p&gt;Using &lt;tt&gt;PubChemQuery&lt;/tt&gt; consists of invoking its class method &lt;tt&gt;query_by_smiles&lt;/tt&gt;. You can do so either via the Ruby interpreter (&lt;tt&gt;ruby&lt;/tt&gt;), or preferably through Interactive Ruby (&lt;tt&gt;irb&lt;/tt&gt;).&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;query&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;smiles&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;c1cccc(Cl)c1(Cl)&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="comment"&gt;# chlorobenzene&lt;/span&gt;
&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Searching CID(s) for SMILES, &lt;span class="expr"&gt;#{smiles}&lt;/span&gt; ...&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="ident"&gt;cid&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;PubChemQuery&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;query_by_smiles&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="ident"&gt;cid&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; 7239&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Layering Complexity&lt;/h4&gt;

&lt;p&gt;We can combine the SMILES query API discussed here with the molfile and image retrieval discussed in the &lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;earlier Hacking Pubchem article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's say you'd like to download PubChem's 2-D image of imatinib (Gleevec) by submitting its SMILES string. Copy the file named &lt;strong&gt;pubchem.rb&lt;/strong&gt;, provided in the original PubChem tutorial, into your working directory. Now you can programmatically download imatinib's 2-D image from PubChem based only on a SMILES string, for example:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;pubchem&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;query&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Cc3ccc(NC(=O)c2ccc(CN1CCN(C)CC1)cc2)cc3Nc5nccc(c4cccnc4)n5&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="comment"&gt;#imatinib&lt;/span&gt;
&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Searching CID(s) for SMILES, &lt;span class="expr"&gt;#{smiles}&lt;/span&gt; ...&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="ident"&gt;cid&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;PubChemQuery&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;query_by_smiles&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

&lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="ident"&gt;cid&lt;/span&gt;
  &lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;CID found: &lt;span class="expr"&gt;#{cid[0]}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

  &lt;span class="ident"&gt;filename&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;cid&lt;/span&gt;&lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;]&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;.png&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Writing image to &lt;span class="expr"&gt;#{filename}&lt;/span&gt; ...&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="constant"&gt;PubChem&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write_image&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cid&lt;/span&gt;&lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;],&lt;/span&gt; &lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
&lt;span class="keyword"&gt;else&lt;/span&gt;
  &lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;No CID for &lt;span class="expr"&gt;#{smiles}&lt;/span&gt; was found.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This produces an image of imatinib called &lt;strong&gt;5291.png&lt;/strong&gt; in your working directory:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/files/5291.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h4&gt;Wrapping Up&lt;/h4&gt;

&lt;p&gt;As you can see, we're just scratching the surface. The approach outlined here offers nearly unlimited possibilities for repackaging PubChem's own content, and mashing this content up with that of other sites. Happy hacking!&lt;/p&gt;</description>
      <pubDate>Thu, 21 Sep 2006 15:12:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:463f9481-4adb-4826-9c55-8f1346f4bcd8</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/21/hacking-pubchem-query-by-smiles</link>
      <category>Databases</category>
      <category>open</category>
      <category>X</category>
      <category>pubchem</category>
      <category>api</category>
      <category>hack</category>
      <category>smiles</category>
      <category>cid</category>
      <category>ruby</category>
    </item>
  </channel>
</rss>
