<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Category Databases</title>
    <link>http://depth-first.com/articles/category/databases</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Yet Another Free Chemistry Database: Pherobase</title>
      <description>&lt;p&gt;&lt;a href="http://pherobase.com"&gt;&lt;img src="http://depth-first.com/demo/20080415/pherobase.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The creation of &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;free chemical databases&lt;/a&gt; continues unabated. &lt;a href="http://depth-first.com/articles/tag/database"&gt;Today's entry&lt;/a&gt; is &lt;a href="http://www.pherobase.com/"&gt;Pherobase&lt;/a&gt;, a service dedicated to documenting the relationship between chemical structures and the insect world.&lt;/p&gt;

&lt;p&gt;Users can search Pherobase by text, or browse a large number of precompiled categories: alphabetical by genus; alphabetical by species; and compounds by genus or species. Each compound data sheet contains a wealth of data, all linked to the primary literature: mass spectrum; nmr; synthesis; and behavioral function. There's even an interactive &lt;a href="http://jmol.sourceforge.net/"&gt;Jmol&lt;/a&gt; model for each entry.&lt;/p&gt;

&lt;p&gt;Pherobase is clearly designed to be useful to farmers and others involved in agriculture who are interested in using pheromones in pest control. Are insects eating your olive tree? &lt;a href="http://www.pherobase.com/database/control/control-host-Olive%20pest-all.php"&gt;Let pherobase help&lt;/a&gt;. Need help with fire ants? &lt;a href="http://www.pherobase.com/database/species/species-Solenopsis-invicta.php"&gt;Pherobase can help there, too&lt;/a&gt;. Wonder what else besides Gypsy Moths might be affected by disparlure? &lt;a href="http://www.pherobase.net/database/compound/compounds-detail-disparlure.php"&gt;Pherobase has the answer&lt;/a&gt;. And nearly all of this information is backed by references to the primary literature.&lt;/p&gt;

&lt;p&gt;Pherobase clearly demonstrates the value of building comprehensive, focused chemical databases around a limited subject of high practical utility. After all, chemistry's most enduring contribution is in the production of useful properties, not the production of compounds.&lt;/p&gt;

&lt;p&gt;Pherobase is also noteworthy for the way it's being used by its creator, &lt;a href="http://www.pherobase.com/elsayed.htm"&gt;Ashraf El-Sayed&lt;/a&gt;. Rather than standing on its own, Pherobase is designed to direct users to suppliers of pheromones and related pest control products by educating them about what might be possible. In this sense, Pherobase's approach offers another intriguing example of &lt;a href="http://depth-first.com/articles/2007/10/11/open-access-business-models-that-can-actually-work-sigma-aldrichs-chemblogs"&gt;an Open Access business model that can actually work&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Tue, 15 Apr 2008 18:56:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:3b0e0e98-1e06-4734-8cb9-670d76f7bfb3</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/04/15/yet-another-free-chemistry-database-pherobase</link>
      <category>Databases</category>
      <category>database</category>
      <category>pherobase</category>
      <category>openaccess</category>
      <category>businessmodel</category>
      <category>insects</category>
      <category>pheromones</category>
      <category>agriculture</category>
    </item>
    <item>
      <title>Yet Another Free Chemistry Database: Sigma-Aldrich Reaction Search</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071016/reaction.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Yet another &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;free chemistry database&lt;/a&gt; comes in the form of Sigma-Aldrich's &lt;a href="http://www.sigmaaldrich.com/rxnonline/reactionSearch.jsp"&gt;Reaction Search&lt;/a&gt;. Draw your reactant and product, and let the database do the rest. Both exact and substructure matching can be used. Search results contain bibliographical references to the primary literature. And if a result's product is available from Aldrich, you'll get a link to the product summary page from where you can purchase it.&lt;/p&gt;

&lt;p&gt;Like Sigma-Aldrich's &lt;a href="http://depth-first.com/articles/2007/10/11/open-access-business-models-that-can-actually-work-sigma-aldrichs-chemblogs"&gt;ChemBlogs&lt;/a&gt;, Reaction Search matches the needs of a for-profit company to attract customers with the needs of chemists for unrestricted free access to research tools. Could this be the shape of things to come?&lt;/p&gt;

&lt;p&gt;The parallels to Aldrich's &lt;a href="http://www.sigmaaldrich.com/cgi-bin/hsrun/Suite7/Suite/Suite.hjx;start=Suite.HsEgrailForm.run?FormName=AldrichHandbook0708_87244"&gt;Handbook&lt;/a&gt; are striking. Few bench chemists today regularly use a &lt;a href="http://www.hbcpnetbase.com/"&gt;CRC Handbook&lt;/a&gt;, yet nearly all of them have a copy of the Aldrich catalog at their desks. Aldrich seems to get this simple idea like no other company. And they're quietly transferring this understanding to the Web.&lt;/p&gt;</description>
      <pubDate>Tue, 16 Oct 2007 09:45:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:2387effa-4fea-4c5e-ba4b-e2e7cf17566c</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/16/yet-another-free-chemistry-database-sigma-aldrich-reaction-search</link>
      <category>Databases</category>
      <category>freechemistrydatabase</category>
      <category>sigmaaldrich</category>
      <category>reactiondatabase</category>
    </item>
    <item>
      <title>Yet Another Free Chemistry Database: Heterocycles Web Edition</title>
      <description>&lt;p&gt;Yet another &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;free chemistry database&lt;/a&gt; comes in the form of a service run by the journal &lt;a href="https://www2.heterocycles.jp"&gt;&lt;em&gt;Heterocycles&lt;/em&gt;&lt;/a&gt;. The &lt;a href="https://www2.heterocycles.jp/journal/index.html"&gt;Heterocycles Web Edition&lt;/a&gt; offers two ways to search for heterocylic ring systems: &lt;a href="https://www2.heterocycles.jp/FMPro?-db=gate.fp5&amp;amp;-format=/w2/structure.html&amp;amp;-view"&gt;by structure&lt;/a&gt; or &lt;a href="https://www2.heterocycles.jp/FMPro?-db=gate.fp5&amp;amp;-format=/w2/synthesis.html&amp;amp;-view"&gt;by synthesis&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You may assume that these services would only search the contents of &lt;em&gt;Heterocycles&lt;/em&gt;. It would then be a pleasant surprise to find a number of highly-regarded journals being covered. Here are some of titles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Angew. Chem. Int. Ed. Engl.&lt;/li&gt;
&lt;li&gt;Chem. Eur. J.&lt;/li&gt;
&lt;li&gt;Eur. J. Org. Chem.&lt;/li&gt;
&lt;li&gt;Heterocycles&lt;/li&gt;
&lt;li&gt;J. Am. Chem. Soc.&lt;/li&gt;
&lt;li&gt;J. Med. Chem.&lt;/li&gt;
&lt;li&gt;J. Nat. Prod. &lt;/li&gt;
&lt;li&gt;J. Org. Chem.&lt;/li&gt;
&lt;li&gt;Org. Lett.&lt;/li&gt;
&lt;li&gt;Synlett&lt;/li&gt;
&lt;li&gt;Tetrahedron&lt;/li&gt;
&lt;li&gt;Tetrahedron Lett.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current query interface supports text only, although a number of important criteria can be used. I haven't searched for many heterocyles, but my results for &lt;em&gt;indolizidine&lt;/em&gt; give a flavor for what you might expect (the actual number of hits was 115):&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070706/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;It would be interesting to know how &lt;em&gt;Heterocycles&lt;/em&gt; populated its database. Is it text-mining, manual curation, both, or something else? Regardless of how it's done, Heterocycles Web Edition is definitely worth looking at.&lt;/p&gt;</description>
      <pubDate>Fri, 06 Jul 2007 09:57:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:d980f37e-fd7b-42a7-9a8f-7662064c5cd0</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/07/06/yet-another-free-chemistry-database-heterocycles-web-edition</link>
      <category>Databases</category>
      <category>database</category>
      <category>free</category>
      <category>internet</category>
      <category>heterocycles</category>
    </item>
    <item>
      <title>Yet Another Free Chemical Database: Reaction Searching with CMLD-BU</title>
      <description>&lt;p&gt;&lt;a href="http://cmld.bu.edu/"&gt;&lt;img src="http://depth-first.com/demo/20070618/CMLD.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;As chemical informatics continues its climb out of a decades-long stagnation, the number of &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;free chemical databases&lt;/a&gt; continues to &lt;a href="http://depth-first.com/articles/2007/05/07/free-chemistry-databases-on-the-web-creating-a-comprehensive-guide"&gt;grow&lt;/a&gt;. But despite all the activity, reaction databases are notably under-represented. For this reason, I was delighted to stumble onto Boston University's &lt;a href="http://cmld.bu.edu/"&gt;Center for Chemical Methodology and Library Development Reaction Database&lt;/a&gt; (CMLD-BU).&lt;/p&gt;

&lt;p&gt;According to their &lt;a href="http://cmld.bu.edu/overview/index.html"&gt;website&lt;/a&gt;, CMLD-BU:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;...is a new center funded by the National Institute of General Medical Sciences ( NIGMS ) focused on the discovery of new methodologies to produce novel chemical libraries of unprecedented complexity for biological screening. The goal of the CMLD-BU is to explore and expand the diversity of small-molecule libraries by creating general, useful protocols for stereocontrolled synthesis. ... A major objective of the CMLD-BU is also to provide information and chemistry protocols to the public on parallel and chemical library synthesis. ...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;center&gt;&lt;a href="http://depth-first.com/demo/20070618/screenshot_large.png"&gt;&lt;img src="http://depth-first.com/demo/20070618/screenshot_small.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Use &lt;a href="http://cmldprotocols.bu.edu/cmld/ViewPublicReactionList.jsp"&gt;this link&lt;/a&gt; to begin exploring their service. To date the CMLD-BU has deposited &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?term=%22CMLD-BU%22%5Bsourcename%5D&amp;amp;cmd=search&amp;amp;db=pcsubstance"&gt;just over 1,600&lt;/a&gt; Substances with &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; and their site shows 125 reaction protocols.&lt;/p&gt;

&lt;p&gt;Although CMLD-BU's user interface could use some tweaking, their content is right on the money: real examples of preparative reactions with links to the primary literature and even spectral data.&lt;/p&gt;

&lt;p&gt;Are we at the end of this process or at the beginning? Only time will tell. But the nearly &lt;a href="http://depth-first.com/articles/2006/09/03/peculiarities-of-chemical-information"&gt;infinite shelflife&lt;/a&gt; and ubiquity of chemical information coupled with the &lt;a href="http://www.amazon.com/gp/browse.html?node=16427261"&gt;inexorable approach&lt;/a&gt; of virtually zero-cost computer services leaves only one of those two possibilities worthy of serious consideration.&lt;/p&gt;</description>
      <pubDate>Mon, 18 Jun 2007 09:08:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:e6a21774-8053-485b-9911-1d9760dcc7f6</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/18/yet-another-free-chemical-database-reaction-searching-with-cmld-bu</link>
      <category>Databases</category>
      <category>cmld</category>
      <category>bu</category>
      <category>database</category>
      <category>reaction</category>
    </item>
    <item>
      <title>Hacking PubChem: Learning to Speak PUG</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/40121670@N00/478093105/"&gt;&lt;img src="http://depth-first.com/demo/20070611/pug.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;A previous article introduced PubChem's &lt;a href="http://depth-first.com/articles/2007/06/04/hacking-pubchem-power-user-gateway"&gt;Power User Gateway&lt;/a&gt; (PUG), an XML-based communication channel. Although NIH kindly supplies a &lt;a href="ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pug.xsd"&gt;commented schema&lt;/a&gt; for PUG queries and responses, there's nothing like seeing real examples when learning a new language. This article will describe one method for conveniently generating PUG XML queries.&lt;/p&gt;

&lt;h4&gt;Let PubChem Build Your Query&lt;/h4&gt;

&lt;p&gt;One of the options on the &lt;a href="http://pubchem.ncbi.nlm.nih.gov/search/search.cgi"&gt;PubChem search page&lt;/a&gt; is "Save Query." As it turns out, PubChem saves queries in PUG XML (I'll just call it PUGML). In other words, preparing a query using the PubChem search page and saving it gives a simple method for creating PUGML queries. Let's try it.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070611/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Using the "Sketch" button, draw the structure of benzimidazole. Under "Search Type", select "Substructure." Now click "Save Query", and you'll download a substructure query for benzimidazole in PUGML:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;?&lt;/span&gt;&lt;span class="tag"&gt;xml&lt;/span&gt; &lt;span class="attribute"&gt;version&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1.0&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;!&lt;/span&gt;&lt;span class="tag"&gt;DOCTYPE&lt;/span&gt; &lt;span class="attribute"&gt;PCT-Data&lt;/span&gt; &lt;span class="attribute"&gt;PUBLIC&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-//NCBI//NCBI PCTools/EN&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Query_type&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryType&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryType_css&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_query_data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;C1=CC=CC2=C1N=C[N]2&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_query_data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_type&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_type_subss&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-CSStructure&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-CSStructure_bonds&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;true&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
                      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-CSStructure&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_type_subss&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_type&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_results&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;2000000&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS_results&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryCompoundCS&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryType_css&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryType&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Query_type&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_query&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;PCT-QueryCompoundCS_type_subss&lt;/tt&gt; element will tell PUG to look for substructures.&lt;/p&gt;

&lt;h4&gt;Using the Saved Query with PUG&lt;/h4&gt;

&lt;p&gt;Saving this file as &lt;strong&gt;benzimidazole_sss.xml&lt;/strong&gt;, lets us feed it to PUG:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ curl -d @benzimidazole_sss.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;and get the following PUGML response:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;?&lt;/span&gt;&lt;span class="tag"&gt;xml&lt;/span&gt; &lt;span class="attribute"&gt;version&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1.0&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;!&lt;/span&gt;&lt;span class="tag"&gt;DOCTYPE&lt;/span&gt; &lt;span class="attribute"&gt;PCT-Data&lt;/span&gt; &lt;span class="attribute"&gt;PUBLIC&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-//NCBI//NCBI PCTools/EN&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;queued&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;62668946396085905&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;Structure search job was submitted&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

We can then check on the status of our query by saving the following as &lt;strong&gt;status.xml&lt;/strong&gt;:

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;62668946396085905&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_type&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;status&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

POSTing this to PUG:

&lt;div class="console"&gt;
&lt;pre&gt;
$ curl -d @status.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;gives us the following PUGML:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;?&lt;/span&gt;&lt;span class="tag"&gt;xml&lt;/span&gt; &lt;span class="attribute"&gt;version&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1.0&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;!&lt;/span&gt;&lt;span class="tag"&gt;DOCTYPE&lt;/span&gt; &lt;span class="attribute"&gt;PCT-Data&lt;/span&gt; &lt;span class="attribute"&gt;PUBLIC&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-//NCBI//NCBI PCTools/EN&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;success&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;Your search has already been completed successfully!.&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_entrez&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_db&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;pccompound&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_db&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_query-key&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;1&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_query-key&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_webenv&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;0CPrI_peUmUtWDooyjxpJ1XAXPcOl-ESZZxj8sJV9ZDR8musMjh1oBTib@1EDD43FA66AE1BE0_0001SID&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez_webenv&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Entrez&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_entrez&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/06/04/hacking-pubchem-power-user-gateway"&gt;Last time&lt;/a&gt;, we got a URL to download a gzipped SD File. This time, our query specified results to be returned as an Entrez Key through the &lt;tt&gt;PCT-Entrez_webenv&lt;/tt&gt; element. We can construct a URL that will let us view these results:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=HistorySearch&amp;amp;WebEnvRq=1&amp;amp;db=pccompound&amp;amp;query_key=1&amp;amp;WebEnv=0CPrI_peUmUtWDooyjxpJ1XAXPcOl-ESZZxj8sJV9ZDR8musMjh1oBTib%401EDD43FA66AE1BE0_0001SID&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Where to Next?&lt;/h4&gt;

&lt;p&gt;If we wanted to get a gzipped SD File instead, we'd need to edit our original query. But manually editing XML is a lot like mowing a lawn with scissors. What we'd really like is a simple API in a language like Ruby that will let us build sophisticated PUG queries, process the results, and pipe them into other queries with little effort. But that's a story for another time.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/40121670@N00/"&gt;sutterbabe68&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Mon, 11 Jun 2007 09:04:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:c9cf69b1-a86f-4a3b-ba3c-a8dcded2fa9f</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/11/hacking-pubchem-learning-to-speak-pug</link>
      <category>Databases</category>
      <category>pubchem</category>
      <category>pug</category>
      <category>xml</category>
      <category>api</category>
      <category>powerusergateway</category>
      <category>ruby</category>
    </item>
    <item>
      <title>Hacking PubChem: Power User Gateway</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/40121670@N00/519042653/"&gt;&lt;img src="http://depth-first.com/demo/20070604/pug.jpg" border="0" align="right"&gt;&lt;/img&gt;&lt;/a&gt;If you've been waiting for a simple way to programatically query PubChem without screen scraping, the wait is over. An (apparently) new service called the Power User Gateway (PUG) now offers a direct, XML-based PubChem data channel.&lt;/p&gt;

&lt;h4&gt;See PUG&lt;/h4&gt;

&lt;p&gt;Previous articles have discussed various methods for hacking PubChem: screen scraping (&lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;link&lt;/a&gt;, &lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;link&lt;/a&gt;); with the &lt;a href="http://depth-first.com/articles/2006/09/23/hacking-pubchem-entrez-programming-utilities"&gt;Entrez Utilities&lt;/a&gt;; and by simply &lt;a href="http://depth-first.com/articles/2006/09/29/hacking-pubchem-direct-access-with-ftp"&gt;replicating the database&lt;/a&gt;. PUG is different in that it is both very simple and apparently quite powerful.&lt;/p&gt;

&lt;p&gt;From the &lt;a href="ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_pug.txt"&gt;PUG documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;... There is a single CGI (pug.cgi, referred to hereafter as simply PUG) that is the central gateway to multiple PubChem functions. PUG takes no URL arguments; all communication with PUG is done by XML. To perform any request, you will formulate your input in XML and then HTTP POST it to PUG. The CGI interprets your incoming request, initiates the appropriate action, then returns results (also) in XML format. ...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;See PUG Run&lt;/h4&gt;

&lt;p&gt;Let's perform a simple query using PUG. As the documentation states, all communication with PUG is done through HTTP POST. In contrast to other approaches to interfacing with PubChem, parameters and results are encoded in raw XML, the schema for which is available &lt;a href="ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pug.xsd"&gt;here&lt;/a&gt;. To use PUG your first step is to locate software capable of encoding this form of HTTP request.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://curl.haxx.se/"&gt;cURL&lt;/a&gt; is such a utility. Among many capabilities, cURL offers a quick and easy way to POST XML to a server and view the response. For example, to POST the file called &lt;strong&gt;foo.xml&lt;/strong&gt; to PUG, the command would be: &lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ curl -d @foo.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Our query will request PubChem's first fifty Compounds in &lt;a href="http://depth-first.com/articles/2006/09/29/hacking-pubchem-direct-access-with-ftp"&gt;sdf.gz&lt;/a&gt; format.&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_download&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download_uids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryUids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryUids_ids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_db&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;pccompound&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_db&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids_E&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;1&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids_E&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids_E&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;50&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids_E&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List_uids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
                &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-ID-List&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryUids_ids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-QueryUids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download_uids&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download_format&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;sdf&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download_compression&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;gzip&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_download&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

After saving this file as &lt;strong&gt;pugtest.xml&lt;/strong&gt;, we can POST it to PUG using cURL:

&lt;div class="console"&gt;
&lt;pre&gt;
$ curl -d @pugtest.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi"
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Run PUG, Run!&lt;/h4&gt;

&lt;p&gt;After POSTing our query, PUG gives one of two possible responses: we're informed of the status of our query, or we're given a URL to download our results.&lt;/p&gt;

&lt;p&gt;Here's an example of a status result:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;?&lt;/span&gt;&lt;span class="tag"&gt;xml&lt;/span&gt; &lt;span class="attribute"&gt;version&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1.0&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;!&lt;/span&gt;&lt;span class="tag"&gt;DOCTYPE&lt;/span&gt; &lt;span class="attribute"&gt;PCT-Data&lt;/span&gt; &lt;span class="attribute"&gt;PUBLIC&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-//NCBI//NCBI PCTools/EN&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;success&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;638302818484957496&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_waiting&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;PCT-Waiting_reqid&lt;/tt&gt; informs us of our query's ID. We could then prepare and POST another query to monitor its status:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;638302818484957496&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_reqid&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request_type&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;status&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData_request&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-InputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_input&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Eventually, we'll get a response containing a &lt;tt&gt;PCT-Download_URL_url&lt;/tt&gt; element. Inside this element is the URL through which we can download our results:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;?&lt;/span&gt;&lt;span class="tag"&gt;xml&lt;/span&gt; &lt;span class="attribute"&gt;version&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1.0&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;!&lt;/span&gt;&lt;span class="tag"&gt;DOCTYPE&lt;/span&gt; &lt;span class="attribute"&gt;PCT-Data&lt;/span&gt; &lt;span class="attribute"&gt;PUBLIC&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;-//NCBI//NCBI PCTools/EN&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;success&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;/&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Status-Message&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_status&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_download-url&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download-URL&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download-URL_url&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;ftp://ftp-private.ncbi.nlm.nih.gov/pubchem/.fetch/766964770894289974.sdf.gz&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download-URL_url&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Download-URL&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output_download-url&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-OutputData&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data_output&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;PCT-Data&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;PUG offers the basic foundation for building a variety of innovative and useful cheminformatics Web services. But before that can happen, high-level APIs will be needed in languages like Ruby, Python, and Java. With these APIs in hand, what kinds of applications will result? Fortunately, imagination is now the only barrier.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/40121670@N00/"&gt;shutterbabe68&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Mon, 04 Jun 2007 07:06:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:51a56e2f-5ac3-4fd4-92e8-74978045eae2</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/04/hacking-pubchem-power-user-gateway</link>
      <category>Databases</category>
      <category>pubchem</category>
      <category>pug</category>
      <category>xml</category>
      <category>api</category>
      <category>ruby</category>
      <category>powerusergateway</category>
      <category>curl</category>
    </item>
    <item>
      <title>Simple CAS Number Lookup with PubChem</title>
      <description>&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif" align="right" border="none"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://www.cas.org/expertise/cascontent/registry/regsys.html"&gt;CAS Registry Numbers&lt;/a&gt; simplify the thorny problem of referring to chemical substances. These short numerical sequences are arguably the most widely-used form of molecular identifier, appearing on reagent bottles, in publications, in patents and patent applications, and MSDS sheets.&lt;/p&gt;

&lt;p&gt;During my time as a synthetic organic chemist, I would sometimes run into the problem of finding the structure of a molecule represented by a CAS number. A common case was when an ambiguous, incomprehensible, or blurred IUPAC name was printed on a reagent bottle along with a CAS number. By looking up the CAS number, I could confirm the bottle's contents.&lt;/p&gt;

&lt;p&gt;Your first impulse when looking up a CAS number might be to fire up &lt;a href="http://www.cas.org/SCIFINDER/"&gt;SciFinder&lt;/a&gt;. For years this was the only option. Those days are quickly starting to seem as quaint as when people actually wrote on pieces of paper and dropped them in mailboxes (&lt;a href="http://netflix.com"&gt;dropping DVDs in a mailbox&lt;/a&gt; is a different matter).&lt;/p&gt;

&lt;p&gt;A little-publicized feature of PubChem makes it an ideal way to quickly find the structure associated with a CAS Number. To use it, you need nothing more than a computer, a browser, and an internet connection.&lt;/p&gt;

&lt;p&gt;Browse over to the &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; welcome page. At the top you'll find a search box. Enter your CAS number and press "Go." For this example, I'm using the CAS number for 2,5-Pyrazinedicarboxylic acid dihydrate:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070521/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;If all goes well, you should see a results screen containing the structure of your compound and a link to its summary page:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070521/screenshot2.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Does this seem a little too good to be true? Try it for yourself. Pick up a copy of the Aldrich catalog, Merck index, or anything else that lists lots of CAS numbers. Choose several structures at random and see how PubChem performs.&lt;/p&gt;

&lt;p&gt;There are limitations to this method. PubChem generally doesn't index large molecules such as polymers and peptides, so they won't be found by this method. Similarly, if a CAS number doesn't point to a distinct molecular entity (e.g. "mineral oil"), PubChem won't find it either. But these are hardly limitations in the vast majority of cases.&lt;/p&gt;

&lt;p&gt;With the &lt;a href="http://www.corporate-ir.net/ireye/ir_site.zhtml?ticker=SIAL&amp;amp;script=410&amp;amp;layout=-6&amp;amp;item_id=984368"&gt;recent addition of Sigma-Aldrich&lt;/a&gt; as a PubChem compound supplier, it won't be long before smaller companies begin following suit. What we're seeing with PubChem is a classic example of a &lt;a href="http://en.wikipedia.org/wiki/Network_effect"&gt;network effect&lt;/a&gt;. The end result should come as a surprise to nobody.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update: &lt;a href="http://chempedia.com"&gt;Chempedia&lt;/a&gt; offers a more detailed &lt;a href="http://depth-first.com/articles/2008/05/26/simple-cas-number-lookup-and-more-with-chempedia"&gt;CAS Number Lookup&lt;/a&gt; service.&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Mon, 21 May 2007 11:46:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:e20e2fc2-e99e-4171-8055-1493bcb31d65</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/05/21/simple-cas-number-lookup-with-pubchem</link>
      <category>Databases</category>
      <category>cas</category>
      <category>pubchem</category>
      <category>casnumber</category>
      <category>lookup</category>
      <category>networkeffect</category>
    </item>
    <item>
      <title>Free Chemistry Databases on the Web: Creating a Comprehensive Guide</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/kateanddave/378962428/"&gt;&lt;img src="http://depth-first.com/demo/20070505/list.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;One of Depth-First's more popular articles is a summary of free databases titled &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;&lt;em&gt;Thirty-Two Free Chemistry Databases&lt;/em&gt;&lt;/a&gt;. Clearly there is a need to link the producers of free chemical databases (developers) with the potential users of these services (chemists). Chemistry is slowly emerging from a decades-long period of over-reliance on a &lt;a href="http://www.cas.org/"&gt;single supplier&lt;/a&gt; of information. As new players enter, they'll need some way to have their message heard.&lt;/p&gt;

&lt;h4&gt;The Problem&lt;/h4&gt;

&lt;p&gt;As evidence of this need, I'm getting more requests to list additional services on the Thirty-Two Databases article - or to provide an updated review of a service already there. This is wonderful!&lt;/p&gt;

&lt;p&gt;One approach would be for me to simply research and write an updated article reviewing the new additions myself. The problem is that thirty-two is already a very large number to deal with. My guess is that there must now be well over sixty or seventy free chemistry databases. That's far too many for one person to research properly on their own.&lt;/p&gt;

&lt;p&gt;On the other hand, the Web is all about &lt;a href="http://depth-first.com/articles/2007/01/18/collective-intelligence-and-the-dumbness-of-crowds"&gt;collaboration&lt;/a&gt;, so why no try to use it that way?&lt;/p&gt;

&lt;h4&gt;An Idea&lt;/h4&gt;

&lt;p&gt;Here's the idea: if you run a &lt;strong&gt;free&lt;/strong&gt; database or other online chemistry service and would like to promote it, post a comment to this article containing a link and brief description of what makes your service different/useful. If you've used a free chemistry database, feel free to provide your thoughts on it. If there's a free database you wish existed but doesn't yet, feel free to write about that. Unlike the other articles on this site for which comments are closed after two weeks, this article's comments will remain open indefinitely.&lt;/p&gt;

&lt;p&gt;After some period of time, I'll use these comments to write a new article highlighting the new material.&lt;/p&gt;

&lt;p&gt;Notice the use of the word "free". A free database can be used by any member of the general public without fees or a lengthy registration process. This includes both &lt;a href="http://depth-first.com/articles/2006/09/27/hacking-pubchem-free-speech-or-free-beer"&gt;free speech and free beer&lt;/a&gt; services. There are &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=286"&gt;more restrictive definitions&lt;/a&gt; that could be applied, but let's not worry about those just yet. Free beer is better than no beer at all.&lt;/p&gt;

&lt;p&gt;Links can either be in HTML or &lt;a href="http://daringfireball.net/projects/markdown/"&gt;Markdown&lt;/a&gt;. Here's one example of each:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt; &lt;span class="attribute"&gt;href&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://megamolecules.com&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;MegaMolecules&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt; (HTML)

[MegaMolecules](http://megamolecules.com) (Markdown)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;The Outcome&lt;/h4&gt;

&lt;p&gt;I have no idea what kind of response this experiment will generate. But if past experience is any guide, large numbers of chemists are keenly interested in free chemistry databases. All they need is a link.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/kateanddave/"&gt;Kate and Dave Hugh&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Mon, 07 May 2007 09:32:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:2e979d74-b484-48e2-a4ca-594e772a1cd1</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/05/07/free-chemistry-databases-on-the-web-creating-a-comprehensive-guide</link>
      <category>Databases</category>
      <category>database</category>
      <category>web</category>
      <category>chemistry</category>
      <category>free</category>
    </item>
    <item>
      <title>Yet Another Free Chemistry Database: FooDB</title>
      <description>&lt;p&gt;&lt;a href="http://hmdb.med.ualberta.ca/foodb/"&gt;&lt;img src="http://depth-first.com/demo/20070306/foodb.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://hmdb.med.ualberta.ca/foodb/"&gt;FooDB&lt;/a&gt; contains over 1,900 structures used as food additives in the United States. The data in FooDB are provided by the &lt;a href="http://vm.cfsan.fda.gov/~dms/eafus.html"&gt;FDA EAFUS site&lt;/a&gt;. You can search by CAS number, IUPAC name, or molecular formula. And in what is likely to be a trend to watch closely, FooDB is powered by the Web application framework &lt;a href="http://www.rubyonrails.org/"&gt;Ruby on Rails&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;FooDB is not alone: the number of &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;free chemistry databases on the Web&lt;/a&gt; just keeps growing. How this trend ultimately plays out is anybody's guess.&lt;/p&gt;

&lt;p&gt;One thing is clear: the point is rapidly approaching at which database aggregation technologies will start to matter. No chemist wants to search through over thirty databases to find the information they need on a molecule. They want it delivered in one quick, intuitive, user-friendly package. Here's another example of something that's &lt;a href="http://depth-first.com/articles/tag/broken"&gt;broken&lt;/a&gt; in cheminformatics. Like all broken things, it's the source of great frustration for users and great opportunity for developers.&lt;/p&gt;</description>
      <pubDate>Tue, 06 Mar 2007 11:09:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:e506dc14-5608-42b8-97aa-8f040a3f527c</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/06/yet-another-free-chemistry-database-foodb</link>
      <category>Databases</category>
      <category>broken</category>
      <category>foodb</category>
      <category>web</category>
      <category>fda</category>
      <category>rails</category>
      <category>ruby</category>
    </item>
    <item>
      <title>Thirty-Two Free Chemistry Databases</title>
      <description>&lt;p&gt;Chemical information is in the early stages of a revolution. Long dominated by a handful of established players, the field has rather suddenly opened up to a variety of innovative newcomers. The Internet now offers a diverse array of free online chemistry databases, twelve of which were summarized in a &lt;a href="http://depth-first.com/articles/2006/11/07/twelve-free-chemistry-databases"&gt;recent article&lt;/a&gt;. This list has since been updated with new information and new entries. The following (incomplete) list summarizes some of the possibilities available for your next search.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/pubchemlogo.gif"&gt;&lt;/img&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt;- The granddaddy of all free chemistry databases. Search over 8 million compounds by a variety of criteria. Although some PubChem records are linked into the primary literature through MeSH, most are not. But this doesn't seem to be PubChem's true calling. Instead, PubChem may well evolve into the world's largest online collection of molecular data sheets. Increasingly, the other databases in this list are cross-referencing their entries into PubChem. PubChem's entire database can be &lt;a href="http://depth-first.com/articles/2006/09/29/hacking-pubchem-direct-access-with-ftp"&gt;downloaded by FTP&lt;/a&gt;. 
&lt;a href="http://www.cas.org/EO/regsys.html"&gt;CAS Registry&lt;/a&gt; are correct to see PubChem as the first real competition they've had in  decades.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/zinc7.jpg"&gt;&lt;/img&gt;&lt;a href="http://blaster.docking.org/zinc/"&gt;ZINC&lt;/a&gt;- A free database of commercially-available compounds for virtual screening. Search over 4.6 million compounds by structure, IUPAC name, InChI, and a host of calculated properties. For noncommercial purposes, the ZINC database may be downloaded in whole or in part for local use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/emolecules.gif"&gt;&lt;/img&gt;&lt;a href="http://emolecules.com"&gt;eMolecules&lt;/a&gt;- Google for molecules. With a simple interface and super fast search engine, eMolecules augments PubChem with other information sources, including specialty chemical catalogs. Although eMolecules' emphasis seems to be on commercially-available compounds, it's only possible to get a link directly into a supplier's online catalog for a limited number of molecules. Most of the links are to PubChem records. For this reason, I don't find eMolecules very useful in its current form. If you remember something called "Chmoogle", this is the same service (moral: &lt;a href="http://www.emolecules.com/doc/google_vs_chmoogle/index.htm"&gt;don't mess with Google&lt;/a&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/ChEBI_logo_small.gif"&gt;&lt;/img&gt;&lt;a href="http://www.ebi.ac.uk/chebi/"&gt;CHEBI&lt;/a&gt;- "A freely available dictionary of molecular entities focused on &#8216;small&#8217; chemical compounds." CHEBI draws its information from &lt;a href="http://www.ebi.ac.uk/chebi/faqForward.do#3"&gt;two main sources&lt;/a&gt;: Integrated Relational Enzyme Database of the EBI and the Kyoto Encyclopedia of Genes and Genomes. Find out what proteins a molecule has been associated with and in what context. Provides cross-links to CAS registry numbers, Beilstein registry numbers, and Gmelin registry numbers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/nist.gif"&gt;&lt;/img&gt;&lt;a href="http://webbook.nist.gov/chemistry/"&gt;NIST Chemistry WebBook&lt;/a&gt;- Physical data (thermochemical, thermophysical, and ion energetics) for mostly organic compounds. Search by formula, structure, CAS number, and IUPAC name.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/bioCycDC.gif"&gt;&lt;a href="http://biocyc.org/open-compounds.shtml"&gt;BioCyc&lt;/a&gt;- A collection of about 3,500 compounds involved as enzyme substrates, products, inhibitors, and activators. On accepting a license agreement, the entire database can be &lt;a href="http://biocyc.org/download.shtml"&gt;freely downloaded&lt;/a&gt; in &lt;a href="http://www.xml-cml.org/"&gt;Chemical Markup Language&lt;/a&gt; format.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/chemexper.gif"&gt;&lt;a href="http://www.chemexper.com/"&gt;ChemExper&lt;/a&gt;- Find a supplier for your specialty chemical needs. Search by structure, name, molecular formula, and CAS number. After finding you compound, get an offer from one or more suppliers. I can't vouch for how this works in practice, but it sounds like a good idea.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.alanwood.net/pesticides/"&gt;Compendium of Pesticide Common Names&lt;/a&gt;- More than 1,100 commonly-used pesticides. Compounds are located by browsing indexed lists (IUPAC name, CAS number, and trade name) rather than searching. Each entry lists, among other pieces of information, a chemical structure and sub-classifications (repellents, antifeedants, synergists, etc.).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/nmrshift-logo.gif"&gt;&lt;a href="http://nmrshiftdb.org"&gt;NMRShiftDB&lt;/a&gt;- Organic structures and their nuclear magnetic resonance (nmr) chemical shifts. NMRShiftDB contains chemical shift data for over 22,000 organic compounds and 19,000 spectra. Records can be searched by structure, chemical shift and nucleus. NMRShiftDB is truly open; it can be &lt;a href="http://depth-first.com/articles/2006/09/04/hacking-nmrshiftdb"&gt;accessed programmatically&lt;/a&gt; and the source code for the software that runs the online database can be &lt;a href="http://sourceforge.net/projects/nmrshiftdb/"&gt;freely downloaded&lt;/a&gt;. Individual users can submit their own spectral shifts for peer review and subsequent inclusion.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://cholla.chemnavigator.com/cgi-bin/lookup/search"&gt;Chemical Structure Lookup Service&lt;/a&gt; (CSLS)- An address book for chemical structures. If you've ever used &lt;a href="http://metacrawler.com"&gt;Metacrawler&lt;/a&gt;, then you'll recognize the idea behind SCLS, which is to aggregate several free chemistry databases. Search over 27 million molecules by IUPAC name, InChI, structure, SMILES, and a variety of molecular identifiers. Your results set will contain links into specific databases that host the molecules you find. The user interface isn't just unfriendly - it's downright antisocial. But if you can get past this, CSLS may well be one of the most useful services in this list.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/drugbank.png"&gt;&lt;a href="http://redpoll.pharmacy.ualberta.ca/drugbank/index.html"&gt;DrugBank&lt;/a&gt;- Combines detailed drug data with comprehensive drug target information. Search over 4,300 drugs by trade name, SMILES, and InChI. Each record contains information on target of action, therapeutic indication, medications the drug is an ingredient of, and trade names. Searches can be limited to only approved drugs or experimental drugs. Both the concept and interface to this service are well thought-out. &lt;em&gt;Note: this service was unavailable as of Jan 19, 2007&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/wikipedia.jpg"&gt;&lt;a href="http://wikipedia.com"&gt;Wikipedia&lt;/a&gt;- Wikipedia? Yes, Wikipedia. Wikipedia offers several kinds of chemical information produced by a knowledgeable, all-volunteer army. Looking for information on organic compounds? Consider &lt;a href="http://en.wikipedia.org/wiki/Morphine"&gt;this datasheet on morphine&lt;/a&gt; as an example. For those interested in synthesis, Wikipedia is increasingly being used to collaboratively author &lt;a href="http://depth-first.com/articles/2006/09/08/chemical-reviews-on-wikipedia"&gt;short reviews on the topic&lt;/a&gt;. Search capabilities are currently limited to text and don't appear to work with IUPAC names or CAS numbers.  Where this quintessential &lt;a href="http://www.amazon.com/exec/obidos/tg/detail/-/0060521996/qid=1101756443/sr=8-1/ref=pd_ka_1/102-0228227-9568947?v=glance&amp;amp;s=books&amp;amp;n=507846"&gt;disruptive technology&lt;/a&gt; and its offspring end up taking chemical publishing is unclear, but the ride will be spectacular.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://cdb.ics.uci.edu/CHEM/Web/"&gt;ChemDB&lt;/a&gt;- A chemical database is but one of the services offered by this site. Search over 4.1 million compounds by structure, or various calculated properties. ChemDB also offers a variety of free online cheminformatics tools such as Babel file format conversion, SMILES depict, and molecular property calculation. Read more about ChemDB in &lt;a href="http://dx.doi.org/10.1093/bioinformatics/bti683"&gt;this &lt;em&gt;Bioinformatics&lt;/em&gt; paper&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/chembank.gif"&gt;&lt;a href="http://chembank.broad.harvard.edu/"&gt;ChemBank&lt;/a&gt;- Structure search over 36,000 original biological assays of small molecules collected by Harvard's &lt;a href="http://iccb.med.harvard.edu/"&gt;Institute of Chemistry and Cell Biology&lt;/a&gt; (ICCB). Many of the data contained in ChemBank have never been published, making this database particularly valuable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/niaid_logo.gif"&gt;&lt;a href="http://chemdb2.niaid.nih.gov/struct_search/default.html"&gt;National Institute of Allergy and Infectious Diseases Database&lt;/a&gt;- Structure search hundreds of thousands of screening datapoints collected by the NIAID in its HIV, Opportunistic Infection, and TB programs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/nationaltoxicologyprogram.gif"&gt;&lt;a href="http://ntp.niehs.nih.gov:8080/"&gt;National Toxicology Program&lt;/a&gt;- Seach by name for compounds listed in the NTP database. Returns detailed internal reports and links to the primary literature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/kinetics_database.gif"&gt;&lt;a href="http://kinetics.nist.gov/index.php"&gt;NIST Chemical Kinetics Database&lt;/a&gt;- Search by reagent or product name or formula for gas phase rate constants collected from the primary literature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/cccbdb_logosmall.gif"&gt;&lt;a href="http://srdata.nist.gov/cccbdb/default.htm"&gt;Computational Chemstry Comparison and Benchmark Database&lt;/a&gt;- Search by formula for over 600 gas phase atom and molecule physical chemistry data obtained experimentally and by computation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/nist.gif"&gt;&lt;a href="http://srdata.nist.gov/solubility/"&gt;IUPAC-NIST Solubility Data Series&lt;/a&gt;- Search by name or CAS number through over 67,000 solubility measurements. Data were comprehensively compiled from over 1,800 references in primary literature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/solvdb.gif"&gt;&lt;a href="http://solvdb.ncms.org/"&gt;SOLV-DB&lt;/a&gt;- Search over 200 common solvents by name, CAS number, or chemical formula physical. Available data include boiling point, water solubility, viscosity, octanol-water partition constant, flash point, and a variety of other properties.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/pdspki.gif"&gt;&lt;a href="http://kidb.bioc.cwru.edu/pdsp.php"&gt;NIMH Pyschoactive Drug Screening Program K&lt;sub&gt;i&lt;/sub&gt; Database&lt;/a&gt;- Search over 44,000 K&lt;sub&gt;i&lt;/sub&gt; determinations culled from the literature. Although this database appears to have no structure search capability, this is listed as a "Future Enhancement". This is a perfect example of a very useful service that could do with a major user interface redesign. There also appears to be another (defunct) service by the same name, but a &lt;a href="http://pdsp.med.unc.edu/pdsp.php"&gt;different URL&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/kegg2_menu.gif"&gt;&lt;a href="http://www.genome.jp/kegg/"&gt;Kyoto Encyclopedia of Genes and Genomes (KEGG)&lt;/a&gt;- A Japanese counterpart to PubChem/PubMed. One of the most interesting services on this list, KEGG consists of four interconnected databases: KEGG Pathway; KEGG genes; KEGG Brite; and KEGG Ligand. KEGG Ligand contains over 14,000 compounds searchable by name, and crosslinked to over 45,000 biological pathways. The KEGG Ligand database can be searched by structure through &lt;a href="http://www.genome.jp/download/"&gt;KegDraw&lt;/a&gt;, a 2-D structure editor written in Java. With some minor configuration on my Linux system, I was able to perform some basic substructure searches using KegDraw. Your mileage may vary. A nice overview of KEGG is available in a &lt;a href="http://dx.doi.org/10.1093/nar/gkj102"&gt;recent article&lt;/a&gt;. The contents of KEGG can be &lt;a href="http://www.genome.jp/anonftp/"&gt;downloaded by anonymous ftp&lt;/a&gt; for academic use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/Brenda.gif"&gt;&lt;a href="http://www.brenda.uni-koeln.de/"&gt;BRENDA&lt;/a&gt;- Search over 40,000 structures as substrates, products, cofactors, or inhibitors for enzymes. Although my search was able to find compounds by substructure, I was not able to view any links to the results. Your mileage may vary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/torvus.jpg"&gt;&lt;a href="http://www2.chemie.uni-erlangen.de/services/biopath/index.html"&gt;Biochemical Pathways Database&lt;/a&gt;- Structure search over 1,100 small molecules as participants in biochemical pathways. A potentially useful service, but currently too slow to fully evaluate. A structure search for naphthalene hung for five minutes before I terminated it without success.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/chemmine.png"&gt;&lt;a href="http://bioweb.ucr.edu/ChemMineV2/"&gt;ChemMine&lt;/a&gt;- Search by structure for compounds collected from a variety of open databases. View assay results in annotated biological experiments. I find the layout and organization of this service annoyingly confusing, but the underlying information appears to be useful nevertheless. Behind the scenes, ChemMine uses two open source cheminformatics libraries: &lt;a href="http://openbabel.sf.net"&gt;Open Babel&lt;/a&gt; and &lt;a href="http://joelib.sf.net"&gt;JOELib&lt;/a&gt;. For a more detailed view of ChemMine, see the &lt;a href="http://dx.doi.org/10.1104/pp.105.062687"&gt;recent article&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/oslogoleft.gif"&gt;&lt;a href="http://www.orgsyn.org/"&gt;Organic Syntheses&lt;/a&gt;- Search by structure through the entire contents of synthetic organic chemistry's flagship resource. Substructure search requires Chime, so if you run Linux, or for some other reason can't install the plugin, you'll be out of luck.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/webreactions.gif"&gt;&lt;a href="http://webreactions.net/index.html"&gt;WebReactions&lt;/a&gt;- Structure search organic reactions in four databases containing a total of over 391,000 reactions. Each reaction hit is linked to the primary literature through a bibliographical reference. Although the interface takes some getting used to, WebReactions may make a worthy companion to the traditional SciFinder search.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/aist.gif"&gt;&lt;a href="http://www.aist.go.jp/RIODB/SDBS/cgi-bin/cre_index.cgi"&gt;Spectral Database for Organic Compounds (SDBS)&lt;/a&gt;- Search by name, molecular formula, molecular weight range, or CAS number through over 14,000 full &lt;sup&gt;1&lt;/sup&gt;H NMR spectra, 12,000 full &lt;sup&gt;13&lt;/sup&gt;C spectra, and 50,000 full FT-IR spectra collected from over 32,000 compounds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/bindingdb.gif"&gt;&lt;a href="http://www.bindingdb.org/"&gt;BindingDB&lt;/a&gt;- Structure search over 24,000 K&lt;sub&gt;i&lt;/sub&gt; and IC&lt;sub&gt;50&lt;/sub&gt; measurements from over 10,000 molecules. Data is collected from, and cross-referenced to, the primary literature. I was unable to determine how to submit a substructure search through the Marvin applet on my Linux system (there is no "Search" button, for example). A text search for "naphthalene", for example, showed some impressive potential for this database. Anyone can currently contribute to BindingDB, one of the few databases on this list to have such a policy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/pdbbind.gif"&gt;&lt;a href="http://sw16.im.med.umich.edu/databases/pdbbind/index.jsp"&gt;PDBBind&lt;/a&gt;- Browse over 2,700 complexes of small molecules ligands with proteins found in the Protein Databank. Structure searching requires a license. 3-D rendering comes courtesy of the ever-popular &lt;a href="http:/jmol.sf.net/"&gt;Jmol&lt;/a&gt; applet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/affindb.jpg"&gt;&lt;a href="http://pc1664.pharmazie.uni-marburg.de/affinity/index.php"&gt;AffinDB&lt;/a&gt;- Search affinity data for complexes found in the Protein Databank. Affinity data are cross-linked to the primary literature through PubMed. Small molecule searching is limited to IUPAC names provided in a pull-down menu. By registering, users can upload affinity data themselves. AffinDB is just one example of what might be possible as chemistry databases begin to combine multiple sources of data into easy-to-use packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070123/chemrefer.gif"&gt;&lt;a href="http://www.chemrefer.com"&gt;ChemRefer&lt;/a&gt;- It doesn't get any simpler. Type in your keywords and get links to the matching full-text PDFs from the primary literature. As &lt;a href="http://depth-first.com/articles/2007/01/15/chemrefer-free-direct-access-to-the-primary-literature"&gt;mentioned before&lt;/a&gt;, the legality of some of ChemRefer's holdings, for example its articles from ACS journals, is not clear. But as more chemistry journals go &lt;a href="http://depth-first.com/articles/2006/11/16/electric-cars-and-open-access"&gt;Open Access&lt;/a&gt;, look to services like ChemRefer to play an increasing role in the way scientists navigate the primary literature.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;</description>
      <pubDate>Wed, 24 Jan 2007 02:32:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:a381e40e-af42-4ab7-a2ba-32d409711a41</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases</link>
      <category>Databases</category>
      <category>internet</category>
      <category>free</category>
      <category>database</category>
    </item>
  </channel>
</rss>
