<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag dataset</title>
    <link>http://depth-first.com/articles/tag/dataset</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Forty-Eight Free QSAR Datasets (and More)</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/brianlewandowski/45385584/"&gt;&lt;img src="http://depth-first.com/demo/20071206/zakim.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Whether you're a medicinal chemist or an informatician, &lt;acronym title="Quantitative Structure Activity Relationship"&gt;QSAR&lt;/acronym&gt; datasets can be very helpful in understanding complex biological phenomena. These datasets typically consist of a hundred or fewer compounds associated with a specific parameter such as intestinal absorption, volume of distribution, blood-brain barrier penetration, or activity at one or more biological targets. Most of them are published as part of a paper appearing in a peer-reviewed journal.&lt;/p&gt;

&lt;p&gt;Unlike &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;chemistry databases&lt;/a&gt;, which typically combine a search engine to a dataset of thousands or millions of compounds with a user interface, the QSAR dataset is much more focused and raw. You need to supply your own data viewer, report generator, and query tool.&lt;/p&gt;

&lt;p&gt;The Internet hosts a bewildering assortment of QSAR datasets tucked into various nooks and crannies. The problem is finding them. One useful resource is &lt;a href="http://cheminformatics.org"&gt;cheminformatics.org&lt;/a&gt;, which hosts a page linking to &lt;a href="http://cheminformatics.org/datasets/index.shtml"&gt;forty-four datasets&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Recently, Shaillay Kumar Dogra, Scientific Editor of &lt;a href="http://www.qsarworld.com/index.php"&gt;QSARWorld&lt;/a&gt;, wrote in to let me know about the site's offering of &lt;a href="http://www.qsarworld.com/qsar-datasets.php"&gt;forty-eight free QSAR datasets&lt;/a&gt;. Each dataset is linked to the primary literature and is available in four formats, including SD File. In contrast to many datasets, those at QSARWorld are manually curated. QSARWorld is also actively seeking new datasets to convert into machine-readable form; if you find one, write to them to have it added in the collection.&lt;/p&gt;

&lt;p&gt;Systematic efforts to collect, curate, and distribute raw data from the primary literature are long overdue. QSARWorld offers an intriguing model for doing so. Although some non-scientific issues, such as intellectual property rights, don't appear to have been addressed yet by QSARWorld, the site's offering of machine-readable raw data offers plenty of food for thought to anyone working with QSAR.&lt;/p&gt;

&lt;p&gt;What's your favorite dataset resource?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/brianlewandowski/"&gt;B.G. Lewandowski&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Thu, 06 Dec 2007 10:20:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:cfd85703-dc1c-49d8-a2ac-578a1f1e196e</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/12/06/forty-eight-free-qsar-datasets-and-more</link>
      <category>Tools</category>
      <category>qsar</category>
      <category>qsarworld</category>
      <category>dataset</category>
      <category>opendata</category>
    </item>
    <item>
      <title>Create Your Own PubChem Datasets: Exporting Results As SD Files</title>
      <description>&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif"  align="right"&gt;&lt;/img&gt;&lt;/a&gt;Recently, &lt;a href="http://depth-first.com/articles/2007/11/12/parsing-sd-files-with-ruby-and-rubidium"&gt;I needed to create a subset&lt;/a&gt; of the PubChem database in Structure Data File (SD File) format. Although it's far from obvious how to do this, the capability does exist. In this article, I'll give a step-by-step procedure for creating custom datasets in SD File format from arbitrary PubChem structure queries.&lt;/p&gt;

&lt;h4&gt;Create and Execute the Query&lt;/h4&gt;

&lt;p&gt;Let's say we want to create a dataset in SD File format containing all N-Boc-protected piperidines registered in PubChem.&lt;/p&gt;

&lt;p&gt;From the main &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem site&lt;/a&gt;, choose the &lt;a href="http://pubchem.ncbi.nlm.nih.gov/search/"&gt;Structure Search&lt;/a&gt; link. Then click the "Sketch" button.&lt;/p&gt;

&lt;p&gt;Next, draw your molecule in the 2D structure editor:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071113/draw.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Then click the "Done" button.&lt;/p&gt;

&lt;p&gt;Before starting the query (by clicking the "Search" button), be sure to select the "Substructure" option under "Search Type."&lt;/p&gt;

&lt;h4&gt;Exporting the Results&lt;/h4&gt;

&lt;p&gt;You should now be looking at a screen containing the first few hits of a 7700+ hitset. But how do we export these results in SD Format?&lt;/p&gt;

&lt;p&gt;Next to a field labeled "Display", you'll see a drop-down box containing several different options. Choose the one labeled "PubChem Download."&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071113/export.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;You'll be redirected to a download page from which you can select output formats, including SDF, or SD File. You can also select a compression type (datasets of even 2000 records can be quite large uncompressed). For this example, we'll select SDF format with GZip compression.&lt;/p&gt;

&lt;p&gt;Clicking on the "Download" button takes us to a status page that eventually informs us when our download has been processed. You should then get a "Save File" dialog or something similar. If not, you should see a link to the compressed SD file.&lt;/p&gt;

&lt;p&gt;Downloading the results file completes the process.&lt;/p&gt;</description>
      <pubDate>Tue, 13 Nov 2007 16:43:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:978ad5ab-d385-4905-abc6-2d9025a601d0</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/13/create-your-own-pubchem-datasets-exporting-results-as-sd-files</link>
      <category>Tools</category>
      <category>pubchem</category>
      <category>sdfile</category>
      <category>dataset</category>
    </item>
  </channel>
</rss>
