<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Create Your Own PubChem Datasets: Exporting Results As SD Files</title>
    <link>http://depth-first.com/articles/2007/11/13/create-your-own-pubchem-datasets-exporting-results-as-sd-files</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Create Your Own PubChem Datasets: Exporting Results As SD Files</title>
      <description>&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif"  align="right"&gt;&lt;/img&gt;&lt;/a&gt;Recently, &lt;a href="http://depth-first.com/articles/2007/11/12/parsing-sd-files-with-ruby-and-rubidium"&gt;I needed to create a subset&lt;/a&gt; of the PubChem database in Structure Data File (SD File) format. Although it's far from obvious how to do this, the capability does exist. In this article, I'll give a step-by-step procedure for creating custom datasets in SD File format from arbitrary PubChem structure queries.&lt;/p&gt;

&lt;h4&gt;Create and Execute the Query&lt;/h4&gt;

&lt;p&gt;Let's say we want to create a dataset in SD File format containing all N-Boc-protected piperidines registered in PubChem.&lt;/p&gt;

&lt;p&gt;From the main &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem site&lt;/a&gt;, choose the &lt;a href="http://pubchem.ncbi.nlm.nih.gov/search/"&gt;Structure Search&lt;/a&gt; link. Then click the "Sketch" button.&lt;/p&gt;

&lt;p&gt;Next, draw your molecule in the 2D structure editor:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071113/draw.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Then click the "Done" button.&lt;/p&gt;

&lt;p&gt;Before starting the query (by clicking the "Search" button), be sure to select the "Substructure" option under "Search Type."&lt;/p&gt;

&lt;h4&gt;Exporting the Results&lt;/h4&gt;

&lt;p&gt;You should now be looking at a screen containing the first few hits of a 7700+ hitset. But how do we export these results in SD Format?&lt;/p&gt;

&lt;p&gt;Next to a field labeled "Display", you'll see a drop-down box containing several different options. Choose the one labeled "PubChem Download."&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071113/export.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;You'll be redirected to a download page from which you can select output formats, including SDF, or SD File. You can also select a compression type (datasets of even 2000 records can be quite large uncompressed). For this example, we'll select SDF format with GZip compression.&lt;/p&gt;

&lt;p&gt;Clicking on the "Download" button takes us to a status page that eventually informs us when our download has been processed. You should then get a "Save File" dialog or something similar. If not, you should see a link to the compressed SD file.&lt;/p&gt;

&lt;p&gt;Downloading the results file completes the process.&lt;/p&gt;</description>
      <pubDate>Tue, 13 Nov 2007 16:43:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:978ad5ab-d385-4905-abc6-2d9025a601d0</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/13/create-your-own-pubchem-datasets-exporting-results-as-sd-files</link>
      <category>Tools</category>
      <category>pubchem</category>
      <category>sdfile</category>
      <category>dataset</category>
    </item>
  </channel>
</rss>
