<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag api</title>
    <link>http://depth-first.com/articles/tag/api</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Hacking PubChem: Learning to Speak PUG</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/40121670@N00/478093105/"&gt;&lt;img src="http://depth-first.com/demo/20070611/pug.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;A previous article introduced PubChem's &lt;a href="http://depth-first.com/articles/2007/06/04/hacking-pubchem-power-user-gateway"&gt;Power User Gateway&lt;/a&gt; (PUG), an XML-based communication channel. Although NIH kindly supplies a &lt;a href="ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pug.xsd"&gt;commented schema&lt;/a&gt; for PUG queries and responses, there's nothing like seeing real examples when learning a new language. This article will describe one method for conveniently generating PUG XML queries.&lt;/p&gt;

&lt;h4&gt;Let PubChem Build Your Query&lt;/h4&gt;

&lt;p&gt;One of the options on the &lt;a href="http://pubchem.ncbi.nlm.nih.gov/search/search.cgi"&gt;PubChem search page&lt;/a&gt; is "Save Query." As it turns out, PubChem saves queries in PUG XML (I'll just call it PUGML). In other words, preparing a query using the PubChem search page and saving it gives a simple method for creating PUGML queries. Let's try it.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070611/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Using the "Sketch" button, draw the structure of benzimidazole. Under "Search Type", select "Substructure." Now click "Save Query", and you'll download a substructure query for benzimidazole in PUGML:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;?&lt;/span&gt;&lt;span class="tag"&gt;xml&lt;/span&gt; &lt;span class="attribute"&gt;version&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1.0&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;!&lt;/span&gt;&lt;span class="tag"&gt;DOCTYPE&lt;/span&gt; &lt;span class="attribute"&gt;PCT-Data&lt;/span&gt; &lt;span class="attribute"&gt;PUBLIC&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-//NCBI//NCBI PCTools/EN&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Query_type&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryType&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryType_css&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_query_data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;C1=CC=CC2=C1N=C[N]2&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_query_data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_type&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_type_subss&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-CSStructure&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-CSStructure_bonds&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;true&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
                      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-CSStructure&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_type_subss&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_type&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_results&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;2000000&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_results&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryType_css&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryType&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Query_type&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;PCT-QueryCompoundCS_type_subss&lt;/tt&gt; element will tell PUG to look for substructures.&lt;/p&gt;

&lt;h4&gt;Using the Saved Query with PUG&lt;/h4&gt;

&lt;p&gt;Saving this file as &lt;strong&gt;benzimidazole_sss.xml&lt;/strong&gt;, lets us feed it to PUG:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ curl -d @benzimidazole_sss.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;and get the following PUGML response:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;?&lt;/span&gt;&lt;span class="tag"&gt;xml&lt;/span&gt; &lt;span class="attribute"&gt;version&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1.0&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;!&lt;/span&gt;&lt;span class="tag"&gt;DOCTYPE&lt;/span&gt; &lt;span class="attribute"&gt;PCT-Data&lt;/span&gt; &lt;span class="attribute"&gt;PUBLIC&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-//NCBI//NCBI PCTools/EN&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;queued&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;62668946396085905&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;Structure search job was submitted&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

We can then check on the status of our query by saving the following as &lt;strong&gt;status.xml&lt;/strong&gt;:

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;62668946396085905&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_type&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;status&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

POSTing this to PUG:

&lt;div class="console"&gt;
&lt;pre&gt;
$ curl -d @status.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;gives us the following PUGML:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;?&lt;/span&gt;&lt;span class="tag"&gt;xml&lt;/span&gt; &lt;span class="attribute"&gt;version&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1.0&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;!&lt;/span&gt;&lt;span class="tag"&gt;DOCTYPE&lt;/span&gt; &lt;span class="attribute"&gt;PCT-Data&lt;/span&gt; &lt;span class="attribute"&gt;PUBLIC&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-//NCBI//NCBI PCTools/EN&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;success&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;Your search has already been completed successfully!.&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_entrez&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_db&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;pccompound&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_db&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_query-key&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;1&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_query-key&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_webenv&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;0CPrI_peUmUtWDooyjxpJ1XAXPcOl-ESZZxj8sJV9ZDR8musMjh1oBTib@1EDD43FA66AE1BE0_0001SID&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_webenv&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_entrez&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/06/04/hacking-pubchem-power-user-gateway"&gt;Last time&lt;/a&gt;, we got a URL to download a gzipped SD File. This time, our query specified results to be returned as an Entrez Key through the &lt;tt&gt;PCT-Entrez_webenv&lt;/tt&gt; element. We can construct a URL that will let us view these results:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=HistorySearch&amp;amp;WebEnvRq=1&amp;amp;db=pccompound&amp;amp;query_key=1&amp;amp;WebEnv=0CPrI_peUmUtWDooyjxpJ1XAXPcOl-ESZZxj8sJV9ZDR8musMjh1oBTib%401EDD43FA66AE1BE0_0001SID&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Where to Next?&lt;/h4&gt;

&lt;p&gt;If we wanted to get a gzipped SD File instead, we'd need to edit our original query. But manually editing XML is a lot like mowing a lawn with scissors. What we'd really like is a simple API in a language like Ruby that will let us build sophisticated PUG queries, process the results, and pipe them into other queries with little effort. But that's a story for another time.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/40121670@N00/"&gt;sutterbabe68&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Mon, 11 Jun 2007 09:04:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:c9cf69b1-a86f-4a3b-ba3c-a8dcded2fa9f</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/11/hacking-pubchem-learning-to-speak-pug</link>
      <category>Databases</category>
      <category>pubchem</category>
      <category>pug</category>
      <category>xml</category>
      <category>api</category>
      <category>powerusergateway</category>
      <category>ruby</category>
    </item>
    <item>
      <title>Hacking PubChem: Power User Gateway</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/40121670@N00/519042653/"&gt;&lt;img src="http://depth-first.com/demo/20070604/pug.jpg" border="0" align="right"&gt;&lt;/img&gt;&lt;/a&gt;If you've been waiting for a simple way to programatically query PubChem without screen scraping, the wait is over. An (apparently) new service called the Power User Gateway (PUG) now offers a direct, XML-based PubChem data channel.&lt;/p&gt;

&lt;h4&gt;See PUG&lt;/h4&gt;

&lt;p&gt;Previous articles have discussed various methods for hacking PubChem: screen scraping (&lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;link&lt;/a&gt;, &lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;link&lt;/a&gt;); with the &lt;a href="http://depth-first.com/articles/2006/09/23/hacking-pubchem-entrez-programming-utilities"&gt;Entrez Utilities&lt;/a&gt;; and by simply &lt;a href="http://depth-first.com/articles/2006/09/29/hacking-pubchem-direct-access-with-ftp"&gt;replicating the database&lt;/a&gt;. PUG is different in that it is both very simple and apparently quite powerful.&lt;/p&gt;

&lt;p&gt;From the &lt;a href="ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_pug.txt"&gt;PUG documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;... There is a single CGI (pug.cgi, referred to hereafter as simply PUG) that is the central gateway to multiple PubChem functions. PUG takes no URL arguments; all communication with PUG is done by XML. To perform any request, you will formulate your input in XML and then HTTP POST it to PUG. The CGI interprets your incoming request, initiates the appropriate action, then returns results (also) in XML format. ...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;See PUG Run&lt;/h4&gt;

&lt;p&gt;Let's perform a simple query using PUG. As the documentation states, all communication with PUG is done through HTTP POST. In contrast to other approaches to interfacing with PubChem, parameters and results are encoded in raw XML, the schema for which is available &lt;a href="ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pug.xsd"&gt;here&lt;/a&gt;. To use PUG your first step is to locate software capable of encoding this form of HTTP request.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://curl.haxx.se/"&gt;cURL&lt;/a&gt; is such a utility. Among many capabilities, cURL offers a quick and easy way to POST XML to a server and view the response. For example, to POST the file called &lt;strong&gt;foo.xml&lt;/strong&gt; to PUG, the command would be: &lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ curl -d @foo.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Our query will request PubChem's first fifty Compounds in &lt;a href="http://depth-first.com/articles/2006/09/29/hacking-pubchem-direct-access-with-ftp"&gt;sdf.gz&lt;/a&gt; format.&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_download&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download_uids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryUids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryUids_ids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_db&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;pccompound&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_db&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids_E&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;1&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids_E&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids_E&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;50&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids_E&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryUids_ids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryUids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download_uids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download_format&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;sdf&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download_compression&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;gzip&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_download&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

After saving this file as &lt;strong&gt;pugtest.xml&lt;/strong&gt;, we can POST it to PUG using cURL:

&lt;div class="console"&gt;
&lt;pre&gt;
$ curl -d @pugtest.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi"
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Run PUG, Run!&lt;/h4&gt;

&lt;p&gt;After POSTing our query, PUG gives one of two possible responses: we're informed of the status of our query, or we're given a URL to download our results.&lt;/p&gt;

&lt;p&gt;Here's an example of a status result:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;?&lt;/span&gt;&lt;span class="tag"&gt;xml&lt;/span&gt; &lt;span class="attribute"&gt;version&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1.0&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;!&lt;/span&gt;&lt;span class="tag"&gt;DOCTYPE&lt;/span&gt; &lt;span class="attribute"&gt;PCT-Data&lt;/span&gt; &lt;span class="attribute"&gt;PUBLIC&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-//NCBI//NCBI PCTools/EN&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;success&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;638302818484957496&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;PCT-Waiting_reqid&lt;/tt&gt; informs us of our query's ID. We could then prepare and POST another query to monitor its status:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;638302818484957496&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_type&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;status&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Eventually, we'll get a response containing a &lt;tt&gt;PCT-Download_URL_url&lt;/tt&gt; element. Inside this element is the URL through which we can download our results:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;?&lt;/span&gt;&lt;span class="tag"&gt;xml&lt;/span&gt; &lt;span class="attribute"&gt;version&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1.0&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;!&lt;/span&gt;&lt;span class="tag"&gt;DOCTYPE&lt;/span&gt; &lt;span class="attribute"&gt;PCT-Data&lt;/span&gt; &lt;span class="attribute"&gt;PUBLIC&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-//NCBI//NCBI PCTools/EN&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;success&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_download-url&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download-URL&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download-URL_url&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;ftp://ftp-private.ncbi.nlm.nih.gov/pubchem/.fetch/766964770894289974.sdf.gz&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download-URL_url&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download-URL&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_download-url&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;PUG offers the basic foundation for building a variety of innovative and useful cheminformatics Web services. But before that can happen, high-level APIs will be needed in languages like Ruby, Python, and Java. With these APIs in hand, what kinds of applications will result? Fortunately, imagination is now the only barrier.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/40121670@N00/"&gt;shutterbabe68&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Mon, 04 Jun 2007 07:06:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:51a56e2f-5ac3-4fd4-92e8-74978045eae2</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/04/hacking-pubchem-power-user-gateway</link>
      <category>Databases</category>
      <category>pubchem</category>
      <category>pug</category>
      <category>xml</category>
      <category>api</category>
      <category>ruby</category>
      <category>powerusergateway</category>
      <category>curl</category>
    </item>
    <item>
      <title>Hacking PubChem: Entrez Programming Utilities</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif" align="right"&gt;&lt;/img&gt;A &lt;a href="http://depth-first.com/articles/2006/09/22/hacking-pubchem-why-the-open-access-fight-is-just-the-beginning"&gt;recent article&lt;/a&gt; poses the question of how to balance the rights of owners of open chemical information resources against those of their users, while promoting an innovative environment for third-party developers. Although &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; was the focus, the discussion could apply to any other chemical information resource. A reasonable approach would be to provide two separate entry points: one for Web browsers and another for various types of semi-autonomous software used in hacking and mashups.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://wwmm.ch.cam.ac.uk/blogs/corbett/"&gt;Peter Corbett&lt;/a&gt; writes to point out that the &lt;a href="http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html"&gt;Entrez Programming Utilities&lt;/a&gt; can be used to query PubChem and other databases under the NIH/NCI/NCBI umbrella. A separate developer server processes requests, and the terms of its use are fairly well stated. Future articles will explore the possibility of building some simple Ruby APIs for this developer PubChem entry point.&lt;/p&gt;</description>
      <pubDate>Sat, 23 Sep 2006 01:22:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:23e87247-8058-42bc-882c-4ccd40d4f695</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/23/hacking-pubchem-entrez-programming-utilities</link>
      <category>Web</category>
      <category>pubchem</category>
      <category>api</category>
      <category>mashup</category>
      <category>entrez</category>
    </item>
    <item>
      <title>Hacking PubChem: Why The Open Access Fight is Just the Beginning</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif" align="right"&gt;&lt;/img&gt;Like no other medium, the Internet tests our basic beliefs about the rights of resource owners and resource users. As the Internet increasingly becomes home to scientific publication mechanisms that have no counterpart in the physical world, a larger question looms: what separates fair use of these services from abuse?&lt;/p&gt;

&lt;p&gt;Depth-First hosts a series of articles, with possibly many more to follow, on programatically accessing open chemical information databases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/09/21/hacking-pubchem-query-by-smiles"&gt;Hacking PubChem: Query by SMILES&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;Hacking PubChem with Ruby&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/09/04/hacking-nmrshiftdb"&gt;Hacking NMRShiftDB&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The availability of open chemical information resources like &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; and &lt;a href="http://nmrshiftdb.org/"&gt;NMRShiftDB&lt;/a&gt; is a very recent phenomenon, and desperately overdue. One premise of this blog is that chemical informatics is at the start of a renaissance; the chemical information revolution that started in the 1950's is now set to continue after a long period of stagnation. Large, open data sources, and open software that mines it, will fuel this transformation, just as they have in bioinformatics.&lt;/p&gt;

&lt;p&gt;The interaction of non-browser software with public databases, although rich in potential payoffs, can also lead to a great deal of damage. PubChem contains millions of structure-searchable compounds. Setting the wrong kinds of programs loose on this site could cause service interruptions ranging from the annoying to the severe.&lt;/p&gt;

&lt;p&gt;There is no standard mechanism for website owners to spell out acceptable use policies to non-browser software. The closest thing we have to a standard is the &lt;a href="http://www.robotstxt.org/wc/exclusion.html"&gt;Robots Exclusion Protocol&lt;/a&gt;. This protocol defines acceptable behaviors for a robot, which according to &lt;a href="http://www.robotstxt.org/wc/faq.html#what"&gt;one definition&lt;/a&gt; consist of: "... a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced." Other definitions are in use. The one thing these definitions seem to have in common is the concept of scale: the more comprehensive and indiscriminate the program is in its interactions with a website, the more like a robot, and less like a browser, it becomes.&lt;/p&gt;

&lt;p&gt;Site owners specify their robots policy in a file called &lt;strong&gt;robots.txt&lt;/strong&gt; hosted on their servers. The &lt;a href="http://pubchem.ncbi.nlm.nih.gov/robots.txt"&gt;PubChem robots.txt file&lt;/a&gt; currently includes the following policies:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;User-agent: *
Disallow: /substance/PcsSrv.cgi
Disallow: /summary/summary.cgi
Disallow: /assay/assay.cgi
Disallow: /image/imgsrv.fcgi
Disallow: /image/smi2gif.fcgi
Disallow: /image/smi2gif.cgi
Disallow: /image/structurefly.cgi
Disallow: /search/NbrQsrv.cgi
Disallow: /search/PreQSrv.cgi&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here, &lt;tt&gt;User-agent&lt;/tt&gt; refers to the name of the robot, which is set as a wildcard, meaning any robot. The &lt;tt&gt;Disallow&lt;/tt&gt; lines refer to resources off-limits to robots.&lt;/p&gt;

&lt;p&gt;One of these disallowed resources, &lt;tt&gt;/search/PreQSrv.cgi&lt;/tt&gt; is explicitly used in the &lt;a href="http://depth-first.com/articles/2006/09/21/hacking-pubchem-query-by-smiles"&gt;PubChem SMILES query article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Is a person who runs software of the type I describe in these articles violating PubChem's use policy? The best answer I can give is, "it depends." I think it would be hard for reasonable people to suggest that using the software as described in the tutorials, with their deliberately limited scope, for research purposes, and with no intent to do damage, represents abuse.&lt;/p&gt;

&lt;p&gt;On the other hand, I can see how reasonable people could argue that a website operating as a comprehensive front-end to PubChem using the techniques described in these articles could be considered abuse. I know I might consider it abuse if I ran PubChem, depending on why I was running the service.&lt;/p&gt;

&lt;p&gt;If I wanted to stimulate innovation in the area of open database mining, I might actually encourage front ends and similar third-party PubChem services. I might set aside servers specifically dedicated to this kind of activity. I might even develop an Open Source PubChem Web-API to help developers get started. Unfortunately, NIH's intentions are not exactly clear on this point.&lt;/p&gt;

&lt;p&gt;Looking at the &lt;a href="http://www.ncbi.nlm.nih.gov/About/disclaimer.html"&gt;NCBI's Copyright and Disclaimers page&lt;/a&gt;, the only document that to my knowledge states any kind of use policy, is not especially illuminating:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;&lt;strong&gt;Conditions of Use&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;This site is maintained by the U.S. Government and is protected by various provisions of Title 18 of the U.S. Code. Violations of Title 18 are subject to criminal prosecution in a federal court. For site security purposes, as well as to ensure that this service remains available to all users, we use software programs to monitor traffic and to identify unauthorized attempts to upload or change information or otherwise cause damage. In the event of authorized law enforcement investigations and pursuant to any required legal process, information from these sources may be used to help identify an individual.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We are left with the critical, but unanswered question: "What represents an unauthorized use of PubChem?"&lt;/p&gt;

&lt;p&gt;The document cited above also raises the truly bizarre possibility of PubChem not actually being capable of granting rights to redistribute what is contained on its servers:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;This site also contains resources such as PubMed Central, Bookshelf, OMIM, and PubChem which incorporate material contributed or licensed by individuals, companies, or organizations that may be protected by U.S. and foreign copyright laws. ...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But this is a subject for another day.&lt;/p&gt;

&lt;p&gt;Getting back to accessing PubChem data, one very far-sighted thing the NIH has done is to make the entire dataset &lt;a href="ftp://ftp.ncbi.nlm.nih.gov/pubchem/"&gt;freely downloadable&lt;/a&gt; in three different file formats. Rather than mine the PubChem website itself, you could download the data to your machine, letting the software you write access it locally. The sheer size of this dataset creates problems of its own. Future articles will describe some approaches to solving them.&lt;/p&gt;

&lt;p&gt;Regardless of your views on the use and abuse of chemical information resources like PubChem, it's clear that getting open resources on the Web is only the first in a long series of controversial steps that will ultimately transform both the practice and culture of research.&lt;/p&gt;</description>
      <pubDate>Fri, 22 Sep 2006 13:58:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:cc1896b1-0927-4254-8dce-d5e3816816aa</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/22/hacking-pubchem-why-the-open-access-fight-is-just-the-beginning</link>
      <category>Open X</category>
      <category>pubchem</category>
      <category>openaccess</category>
      <category>fairuse</category>
      <category>api</category>
      <category>mashup</category>
    </item>
    <item>
      <title>Hacking PubChem: Query by SMILES</title>
      <description>&lt;p&gt;Recently, I showed how &lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;a simple PubChem API&lt;/a&gt; could be built from a few lines of Ruby code. The API we created could retrieve a molfile and a 2-D molecular rendering given a PubChem compound ID (CID). In this tutorial, we'll see how a SMILES query mechanism can be added to the API, enabling CIDs to be retrieved from any valid SMILES string. We'll also see how to extend this capability to retrieving a 2-D image from PubChem by submitting a SMILES string.&lt;/p&gt;

&lt;h4&gt;Credits&lt;/h4&gt;

&lt;p&gt;The API that follows is based on the &lt;strong&gt;pubchem.rb&lt;/strong&gt; file found in &lt;a href="http://rubyforge.org/projects/chemruby"&gt;Chemruby&lt;/a&gt; by Tadashi Kadowaki and Nobua Tanaka.&lt;/p&gt;

&lt;h4&gt;Defining the Problem&lt;/h4&gt;

&lt;p&gt;We want to create a PubChem API that returns an &lt;tt&gt;Array&lt;/tt&gt; of CIDs given any valid SMILES string. The API will communicate with the publically-available molecular database &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; using HTTP.&lt;/p&gt;

&lt;p&gt;In some cases, PubChem associates more than one CID for a given molecular structure. For example, querying the SMILES string &lt;tt&gt;c1ccccc1&lt;/tt&gt; (benzene) finds both benzene and C-14 benzene. The software needs to handle these cases as well.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;The only thing you'll need for this tutorial is Ruby, preferably v1.8 or better.&lt;/p&gt;

&lt;h4&gt;Code&lt;/h4&gt;

&lt;p&gt;Create a file called &lt;strong&gt;query.rb&lt;/strong&gt; in your working directory containing the following code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;uri&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net/http&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="comment"&gt;# A simple SMILES query for PubChem based on the file &amp;lt;tt&amp;gt;pubchem.rb&amp;lt;/tt&amp;gt;,&lt;/span&gt;
&lt;span class="comment"&gt;# and originally part of Chemruby (http://rubyforge.org/project/chemruby).&lt;/span&gt;
&lt;span class="comment"&gt;# Distributed under Ruby's License.&lt;/span&gt;
&lt;span class="comment"&gt;#&lt;/span&gt;
&lt;span class="comment"&gt;# Copyright (C) 2005, 2006 KADOWAKI Tadashi &amp;lt;kado@kuicr.kyoto-u.ac.jp&amp;gt;&lt;/span&gt;
&lt;span class="comment"&gt;#                          TANAKA   Nobuya  &amp;lt;tanaka@kuicr.kyoto-u.ac.jp&amp;gt;&lt;/span&gt;
&lt;span class="comment"&gt;#                          APODACA  Richard &amp;lt;r_apodaca@users.sf.net&amp;gt;&lt;/span&gt;
&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;PubChemQuery&lt;/span&gt;
  &lt;span class="attribute"&gt;@@host&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;pubchem.ncbi.nlm.nih.gov&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="attribute"&gt;@@searchpath&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;/search/&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="attribute"&gt;@@query&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;PreQSrv.cgi&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-----boundary-----&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

  &lt;span class="comment"&gt;# Synthetic form data. Lifted from Chemruby &amp;lt;tt&amp;gt;pubchem.rb&amp;lt;/tt&amp;gt;&lt;/span&gt;
  &lt;span class="attribute"&gt;@@data&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;[&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Content-Disposition: form-data; name=&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;mode&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;simplequery&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Content-Disposition: form-data; name=&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;queue&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;ssquery&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Content-Disposition: form-data; name=&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;simple_searchdata&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;%s&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Content-Disposition: form-data; name=&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;simple_searchtype&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;fs&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Content-Disposition: form-data; name=&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;maxhits&lt;span class="escape"&gt;\&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;%s&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
    &lt;span class="attribute"&gt;@@boundary&lt;/span&gt;&lt;span class="punct"&gt;].&lt;/span&gt;&lt;span class="ident"&gt;join&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="escape"&gt;\x0d\x0a&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns an &amp;lt;tt&amp;gt;Array&amp;lt;/tt&amp;gt; of CIDs matching &amp;lt;tt&amp;gt;smiles&amp;lt;/tt&amp;gt;. If no matches are found,&lt;/span&gt;
  &lt;span class="comment"&gt;# &amp;lt;tt&amp;gt;nil&amp;lt;/tt&amp;gt; is returned.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.query_by_smiles&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;maxhits&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;100&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;form_response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;post_form&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;maxhits&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;wait_response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;process_wait_page&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;form_response&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;url&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;get_report_url&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;wait_response&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;url&lt;/span&gt; &lt;span class="punct"&gt;?&lt;/span&gt; &lt;span class="ident"&gt;process_report&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;url&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="punct"&gt;:&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

&lt;span class="ident"&gt;private&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns the response to posting the initial search form.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.post_form&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;maxhits&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@host&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;80&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;post&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@searchpath&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="attribute"&gt;@@query&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="attribute"&gt;@@data&lt;/span&gt; &lt;span class="punct"&gt;%&lt;/span&gt; &lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;maxhits&lt;/span&gt;&lt;span class="punct"&gt;],&lt;/span&gt;
      &lt;span class="punct"&gt;{&lt;/span&gt;
        &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;Content-Type&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;multipart/form-data; boundary=&lt;span class="expr"&gt;#{@@boundary}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt;
        &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;Referer&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/search/&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
      &lt;span class="punct"&gt;}).&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;response&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Processes the wait page displayed after submission of the search form.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.process_wait_page&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="ident"&gt;m&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;/&lt;/span&gt;&lt;span class="regex"&gt;url=&amp;quot;([^&amp;quot;]+)&amp;quot;&lt;/span&gt;&lt;span class="punct"&gt;/.&lt;/span&gt;&lt;span class="ident"&gt;match&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@host&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;80&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
        &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@searchpath&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;m&lt;/span&gt;&lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="number"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;]).&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;
      &lt;span class="keyword"&gt;end&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;response&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns the URL, as a &amp;lt;tt&amp;gt;String&amp;lt;/tt&amp;gt;, to the search report, given the specified&lt;/span&gt;
  &lt;span class="comment"&gt;# body of the wait page.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.get_report_url&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;url&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@host&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;80&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="keyword"&gt;while&lt;/span&gt; &lt;span class="punct"&gt;/&lt;/span&gt;&lt;span class="regex"&gt;setTimeout&lt;span class="escape"&gt;\(&lt;/span&gt;'document.location.replace&lt;span class="escape"&gt;\(&lt;/span&gt;&amp;quot;([^&amp;quot;]+)&amp;quot;&lt;span class="escape"&gt;\)&lt;/span&gt;;', (&lt;span class="escape"&gt;\d&lt;/span&gt;+)&lt;span class="escape"&gt;\)&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;/&lt;/span&gt; &lt;span class="punct"&gt;=~&lt;/span&gt; &lt;span class="ident"&gt;body&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt;
        &lt;span class="ident"&gt;sleep&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="global"&gt;$2&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;to_f&lt;/span&gt;&lt;span class="punct"&gt;/&lt;/span&gt;&lt;span class="number"&gt;100&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

        &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;URI&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parse&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="global"&gt;$1&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;to_s&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
        &lt;span class="ident"&gt;body&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;response&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;
        &lt;span class="ident"&gt;url&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;response&lt;/span&gt;&lt;span class="punct"&gt;['&lt;/span&gt;&lt;span class="string"&gt;location&lt;/span&gt;&lt;span class="punct"&gt;']&lt;/span&gt;
      &lt;span class="keyword"&gt;end&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;url&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Extracts CIDs from the search report contained at &amp;lt;tt&amp;gt;url&amp;lt;/tt&amp;gt;.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.process_report&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;url&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;cid&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Array&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@host&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;80&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="comment"&gt;# text format&lt;/span&gt;
      &lt;span class="ident"&gt;url&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;sub!&lt;/span&gt;&lt;span class="punct"&gt;(/&lt;/span&gt;&lt;span class="regex"&gt;cmd=Select&lt;span class="escape"&gt;\+&lt;/span&gt;from&lt;span class="escape"&gt;\+&lt;/span&gt;History&lt;/span&gt;&lt;span class="punct"&gt;/,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;cmd=Text&amp;amp;dopt=Brief&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
      &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;url&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;scan&lt;/span&gt;&lt;span class="punct"&gt;(/&lt;/span&gt;&lt;span class="regex"&gt;&lt;span class="escape"&gt;\d&lt;/span&gt;+: CID: (&lt;span class="escape"&gt;\d&lt;/span&gt;+)&lt;/span&gt;&lt;span class="punct"&gt;/).&lt;/span&gt;&lt;span class="ident"&gt;each&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
        &lt;span class="ident"&gt;cid&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;push&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;])&lt;/span&gt;
      &lt;span class="keyword"&gt;end&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;cid&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You might want to &lt;a href="http://pubchem.ncbi.nlm.nih.gov/search/"&gt;manually submit a SMILES query&lt;/a&gt; to PubChem as a refresher on how this webapp works. Briefly, the contents of the SMILES search field are read, and a wait screen appears, typically for three seconds. You are then redirected to a results report page containing thumbnail images of the hits and their CIDs.&lt;/p&gt;

&lt;p&gt;The PubChemQuery class contains a single public class method, &lt;tt&gt;query_by_smiles&lt;/tt&gt;. This method builds a form to submit, based on the supplied SMILES string and optional &lt;tt&gt;maxhits&lt;/tt&gt; argument. It then waits until PubChem indicates that the query is about to finish processing. The URL for the results report page is then parsed. If a nonempty URL was found, then its page is loaded, and CIDs are scraped. Otherwise, the method returns &lt;tt&gt;nil&lt;/tt&gt;.&lt;/p&gt;

&lt;h4&gt;Usage&lt;/h4&gt;

&lt;p&gt;Using &lt;tt&gt;PubChemQuery&lt;/tt&gt; consists of invoking its class method &lt;tt&gt;query_by_smiles&lt;/tt&gt;. You can do so either via the Ruby interpreter (&lt;tt&gt;ruby&lt;/tt&gt;), or preferably through Interactive Ruby (&lt;tt&gt;irb&lt;/tt&gt;).&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;query&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;smiles&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;c1cccc(Cl)c1(Cl)&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="comment"&gt;# chlorobenzene&lt;/span&gt;
&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Searching CID(s) for SMILES, &lt;span class="expr"&gt;#{smiles}&lt;/span&gt; ...&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="ident"&gt;cid&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;PubChemQuery&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;query_by_smiles&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="ident"&gt;cid&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; 7239&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Layering Complexity&lt;/h4&gt;

&lt;p&gt;We can combine the SMILES query API discussed here with the molfile and image retrieval discussed in the &lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;earlier Hacking Pubchem article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's say you'd like to download PubChem's 2-D image of imatinib (Gleevec) by submitting its SMILES string. Copy the file named &lt;strong&gt;pubchem.rb&lt;/strong&gt;, provided in the original PubChem tutorial, into your working directory. Now you can programmatically download imatinib's 2-D image from PubChem based only on a SMILES string, for example:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;pubchem&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;query&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Cc3ccc(NC(=O)c2ccc(CN1CCN(C)CC1)cc2)cc3Nc5nccc(c4cccnc4)n5&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="comment"&gt;#imatinib&lt;/span&gt;
&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Searching CID(s) for SMILES, &lt;span class="expr"&gt;#{smiles}&lt;/span&gt; ...&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="ident"&gt;cid&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;PubChemQuery&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;query_by_smiles&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

&lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="ident"&gt;cid&lt;/span&gt;
  &lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;CID found: &lt;span class="expr"&gt;#{cid[0]}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

  &lt;span class="ident"&gt;filename&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;cid&lt;/span&gt;&lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;]&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;.png&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Writing image to &lt;span class="expr"&gt;#{filename}&lt;/span&gt; ...&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="constant"&gt;PubChem&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write_image&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cid&lt;/span&gt;&lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;],&lt;/span&gt; &lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
&lt;span class="keyword"&gt;else&lt;/span&gt;
  &lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;No CID for &lt;span class="expr"&gt;#{smiles}&lt;/span&gt; was found.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This produces an image of imatinib called &lt;strong&gt;5291.png&lt;/strong&gt; in your working directory:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/files/5291.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h4&gt;Wrapping Up&lt;/h4&gt;

&lt;p&gt;As you can see, we're just scratching the surface. The approach outlined here offers nearly unlimited possibilities for repackaging PubChem's own content, and mashing this content up with that of other sites. Happy hacking!&lt;/p&gt;</description>
      <pubDate>Thu, 21 Sep 2006 15:12:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:463f9481-4adb-4826-9c55-8f1346f4bcd8</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/21/hacking-pubchem-query-by-smiles</link>
      <category>Databases</category>
      <category>open</category>
      <category>X</category>
      <category>pubchem</category>
      <category>api</category>
      <category>hack</category>
      <category>smiles</category>
      <category>cid</category>
      <category>ruby</category>
    </item>
    <item>
      <title>Hacking PubChem with Ruby</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif" align="right"&gt;&lt;/img&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; is an increasingly popular, free-access, online molecular database operated by the National Institutes of Health. Web services are a hot topic, with sites such as &lt;a href="http://www.flickr.com/services/api/"&gt;Flickr&lt;/a&gt;, &lt;a href="http://www.google.com/apis/"&gt;Google&lt;/a&gt;, and &lt;a href="http://developer.ebay.com/common/api/"&gt;eBay&lt;/a&gt; offering developers the tools to build rich content through "mashups" of several web APIs. Although there is no formal PubChem API, it's possible to roll your own. As a demonstration, this article will show how structural information can be retrieved from PubChem using some simple Ruby code. The inspiration for this article came from the &lt;tt&gt;PubChem&lt;/tt&gt; module that is part of &lt;a href="http://chemruby.org"&gt;Chemruby&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The only thing you'll need for this tutorial is Ruby, preferably version 1.8.2 or higher. Create a directory called &lt;strong&gt;pubchem&lt;/strong&gt; and make it your working directory. Then create a file called &lt;strong&gt;pubchem.rb&lt;/strong&gt; containing the following code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net/http&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="comment"&gt;# A very simple PubChem Web API.&lt;/span&gt;
&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;PubChem&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns a molfile (as a String) for the molecule with PubChem&lt;/span&gt;
  &lt;span class="comment"&gt;# CID matching compound_id.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.get_molfile&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;compound_id&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;molfile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt;
    &lt;span class="ident"&gt;path&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;/summary/summary.cgi?cid=&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;compound_id&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&amp;amp;disopt=DisplaySDF&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;pubchem.ncbi.nlm.nih.gov&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;path&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="ident"&gt;molfile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;response&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;molfile&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Writes a PNG image, for the molecule with PubChem&lt;/span&gt;
  &lt;span class="comment"&gt;# CID matching compound_id, to the file specified by filename.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.write_image&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;compound_id&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;path&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;/image/imgsrv.fcgi?t=l&amp;amp;cid=&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;compound_id&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;pubchem.ncbi.nlm.nih.gov&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;path&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="ident"&gt;image&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;response&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;

      &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;open&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;w&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;file&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
        &lt;span class="ident"&gt;file&lt;/span&gt; &lt;span class="punct"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="ident"&gt;image&lt;/span&gt;
      &lt;span class="keyword"&gt;end&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

PubChem references each of its compounds by a unique integer identifier, the PubChem CID. This is very handy because retrieving PubChem resources is as simple as encoding a URL containing the CID of interest. The class above illustrates how this system can be used to get a molfile and write a PNG image using just a few lines of Ruby.

Using the &lt;tt&gt;PubChem&lt;/tt&gt; class is simplicity itself. To get the molfile for Levonorgestrel (&lt;a href="http://www.go2planb.com/ForConsumers/Index.aspx"&gt;Plan B&lt;/a&gt;), which has the CID 13109:

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;pubchem&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;molfile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;PubChem&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;get_molfile&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;13109&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt; &lt;span class="comment"&gt;#=&amp;gt; returns the molfile for Levonorgestrel as a String&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

To write the 2-D structure diagram of Levonorgestrel as a PNG:

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;pubchem&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;PubChem&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;write_png&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;13109&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;image.png&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt; &lt;span class="comment"&gt;#=&amp;gt; writes a PNG image of Levonorgestrel&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

This code saves the image below to your working directory as &lt;strong&gt;image.png&lt;/strong&gt;.

&lt;center&gt;&lt;img src="http://depth-first.com/files/levonorgestrel.png"&gt;&lt;/img&gt;&lt;/center&gt;

The above two code fragments can either be saved as a file and executed by the Ruby interpreter:

&lt;div class="console"&gt;
&lt;pre&gt;
$ ruby filename.rb
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;or it they be entered interactively in your console with &lt;a href="http://tryruby.hobix.com/"&gt;irb&lt;/a&gt;:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt;  
&lt;/pre&gt;   
&lt;/div&gt;

&lt;p&gt;As you can see, there's not much to building a PubChem API in Ruby. The same principles discussed here should apply in any programming language. Future articles in this series will show how to build more complex PubChem APIs and integrate them with other software packages and web services.&lt;/p&gt;</description>
      <pubDate>Wed, 30 Aug 2006 02:29:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:6cc9c1f4-db5b-4a86-96f1-9c9081a71b5d</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby</link>
      <category>Databases</category>
      <category>pubchem</category>
      <category>ruby</category>
      <category>mashup</category>
      <category>api</category>
    </item>
  </channel>
</rss>
