<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Raiding Chemistry's Data Tombs</title>
    <link>http://depth-first.com/articles/2008/02/04/raiding-chemistrys-data-tombs</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Raiding Chemistry's Data Tombs</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;a href="http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising"&gt;&lt;img src="http://depth-first.com/demo/20080204/chart.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.nodalpoint.org/blog/duncan"&gt;Duncan Hull&lt;/a&gt; offers an &lt;a href="http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising"&gt;interesting commentary&lt;/a&gt; on the rapid increase in the number of biologically-oriented databases. He asks whether all of this abundance is leading to nothing more than a bad case of data indigestion, in which data is dumped into write-only "data tombs," never to be seen again.&lt;/p&gt;

&lt;p&gt;A data tomb is created whenever the ability to generate data outstrips the ability to do useful things with it. Like the burial tombs of ancient civilizations, data tombs are created for many reasons and take many forms.&lt;/p&gt;

&lt;p&gt;Where are chemistry's data tombs and what do they look like? Given that &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;the number of free chemistry databases&lt;/a&gt; pales in comparison to the number free biological databases, the question may seem irrelevant.&lt;/p&gt;

&lt;p&gt;Nevertheless, data tombs in chemistry are ubiquitous. The most obvious examples are the supplementary data sections of major chemical journals. These write-only databases suffer from dual afflictions of &lt;a href="http://pubs.acs.org/subscribe/journals/joceah/supmat/index.html"&gt;copyright restriction&lt;/a&gt; and &lt;a href="http://depth-first.com/articles/2007/12/18/if-you-want-to-change-the-world-build-the-tool-first-part-1"&gt;electronic degradation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The collective experimental sections of the world's chemical literature is, in effect, a vast catacomb of jealously-guarded, but poorly-catalogued treasures.&lt;/p&gt;

&lt;p&gt;Data silos are an especially prevalent kind of data tomb that result when data is created for a single use and either for technical or political reasons never placed in a real database. SD files containing SAR data, PowerPoint slides containing tables of synthetic yields, and Word documents containing experimental procedures are some of the forms these chemical data silos take.&lt;/p&gt;

&lt;p&gt;What chemical data tombs have you run into, and what methods did you use to raid them?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="nodalpoint.org](http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising"&gt;Duncan Hull&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Mon, 04 Feb 2008 11:46:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:3eb20440-f70b-451b-9937-ec4284ec04f6</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/02/04/raiding-chemistrys-data-tombs</link>
      <category>Meta</category>
      <category>datatomb</category>
      <category>experimentalsection</category>
      <category>supplementarydata</category>
      <category>hamburger</category>
    </item>
    <item>
      <title>"Raiding Chemistry's Data Tombs" by Wolfgang Robien</title>
      <description>&lt;p&gt;Rich: Multiplicities are necessary for this search strategy - you need at least ODD/EVEN.&lt;/p&gt;

&lt;p&gt;The behaviour of this server is comparable to the Tanimoto-index when searching for similar structures - 'featureless' spectra (all lines within one narrow range) gives uncharacteristic results. Spectra having their lines scattered over the usual C-range give quite good results.&lt;/p&gt;</description>
      <pubDate>Wed, 06 Feb 2008 14:47:18 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:24c19d18-1fc2-4660-b60b-a0382f8dbdda</guid>
      <link>http://depth-first.com/articles/2008/02/04/raiding-chemistrys-data-tombs#comment-372</link>
    </item>
    <item>
      <title>"Raiding Chemistry's Data Tombs" by Rich Apodaca</title>
      <description>&lt;p&gt;Wolfgang, I've had a look at &lt;a href="http://nmrpredict.orc.univie.ac.at/case/propose.php" rel="nofollow"&gt;nmrpredict&lt;/a&gt; and it's quite impressive.&lt;/p&gt;

&lt;p&gt;I was wondering - do I need to specify peak multiplicity, or can it be left out (or specified as a wildcard)? Most 13C spectra I've come across aren't proton-coupled or APT.&lt;/p&gt;</description>
      <pubDate>Wed, 06 Feb 2008 13:42:06 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:7c27bfd3-2efc-4e10-9ab2-151ee90f1819</guid>
      <link>http://depth-first.com/articles/2008/02/04/raiding-chemistrys-data-tombs#comment-371</link>
    </item>
    <item>
      <title>"Raiding Chemistry's Data Tombs" by nmrpredict</title>
      <description>&lt;p&gt;Geoffrey:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://nmrpredict.orc.univie.ac.at" rel="nofollow"&gt;http://nmrpredict.orc.univie.ac.at&lt;/a&gt;  allows access to approx. 150,000 experimental NMR-data (CSEARCH-collection). Search is possible by name, functional group and (partial) structure.
Furthermore you have access to predicted CNMR-spectra for the PUBCHEM-structures - search by  peaklist is possible.
This all comes for free ......&lt;/p&gt;

&lt;p&gt;Furthermore a few printed versions from the 'good, old days' of NMR, like Bruker's blue book have been hacked into the computer - the system is named 'NMRShiftDB' (a few other NMR-textbooks have been also hacked into NMRShiftDB, e.g. Braun, Berger, Kalinowski), SDBS is around, which has some 13,000 CNMR-data&lt;/p&gt;

&lt;p&gt;Commercial systems are around - a few in alphabetical order:&lt;/p&gt;

&lt;p&gt;ACD, BIORAD, CHEMGATE, MESTREC, NMRPredict, SPECINFO&lt;/p&gt;

&lt;p&gt;Accessing e.g. NMRPredict Online Full with 425,000 NMR-spectra as knowledge base is Euro 155.- per year or 1,7 Eurocent per prediction - That's exactly YOUR contribution that people like me dig out (and check/correct !) the data and develop search strategies !&lt;/p&gt;

&lt;p&gt;Have a look into: &lt;/p&gt;

&lt;p&gt;&lt;a href="http://nmrpredict.orc.univie.ac.at/csearchlite/Annulenes_or_Pyridines.html" rel="nofollow"&gt;http://nmrpredict.orc.univie.ac.at/csearchlite/Annulenes_or_Pyridines.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and compare the costs (=manpower !) of the 2 involved papers against the costs of a few spectral similarity searches .......&lt;/p&gt;

&lt;p&gt;As far as NMR is concerned your statements are strongly 'biased' - just my opinion !&lt;/p&gt;

&lt;p&gt;Wolfgang Robien&lt;/p&gt;</description>
      <pubDate>Wed, 06 Feb 2008 06:39:14 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:1491ee9d-4b10-485c-b2a8-49d2f8a37611</guid>
      <link>http://depth-first.com/articles/2008/02/04/raiding-chemistrys-data-tombs#comment-370</link>
    </item>
    <item>
      <title>"Raiding Chemistry's Data Tombs" by Rich Apodaca</title>
      <description>&lt;p&gt;Sebastian, a data tomb can take many forms, and I agree with you that a restricted, fee-only system is one of them. I didn't want to dwell on that though since so much of what's been written on Depth-First relates to that problem. And I would certainly rather have the problem of data tombs being freely-available electronic database than the much worse situation in which chemistry finds itself.&lt;/p&gt;

&lt;p&gt;In this case, I thought it was worth pointing out that data tombs can be made out of many different kinds of materials: paper, email, SD Files, and Office documents, to name a few.&lt;/p&gt;</description>
      <pubDate>Tue, 05 Feb 2008 11:29:33 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:b9e930e3-6ef6-49f4-bd29-9204eca66868</guid>
      <link>http://depth-first.com/articles/2008/02/04/raiding-chemistrys-data-tombs#comment-364</link>
    </item>
    <item>
      <title>"Raiding Chemistry's Data Tombs" by Sebastian Rohrer</title>
      <description>&lt;p&gt;I must totally object to some of the statements in that post. &lt;/p&gt;

&lt;p&gt;[quote]
Where are chemistry's data tombs and what do they look like? Given that the number of free chemistry databases pales in comparison to the number free biological databases, the question may seem irrelevant. [\quote]&lt;/p&gt;

&lt;p&gt;Some biological databases might be useful only to a tiny group of researchers. But in contrast to chemistry, biology has a culture of free databases, so that anyone who compiles data for himself makes it available to everyone else. 
Consider just the major biological databases: PDB, UniProt, EMBL/GeneBank. They contain highly useful data and make them freely available. They are all cross-referenced and available in non-redundant form. &lt;/p&gt;

&lt;p&gt;Now take a look at chemistry: What about the MDDR, WOMBAT and the like? For my part I would be quite happy, if the only thing we needed to "pay" for freely accessible data would be a growing number of "niche" databases...&lt;/p&gt;</description>
      <pubDate>Tue, 05 Feb 2008 03:59:08 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:38c203f5-d716-4baa-90cc-f459c49ec113</guid>
      <link>http://depth-first.com/articles/2008/02/04/raiding-chemistrys-data-tombs#comment-361</link>
    </item>
    <item>
      <title>"Raiding Chemistry's Data Tombs" by Geoffrey Hutchison</title>
      <description>&lt;p&gt;Let's not forget all of the printed IR and NMR and other spectroscopic databases. These are really modern-day "tombs" since they're extremely difficult to search by computer.&lt;/p&gt;

&lt;p&gt;Some of these are being upgraded to computer-based databases, but for me, I don't have access to them -- I have to use the printed version in the library.&lt;/p&gt;</description>
      <pubDate>Mon, 04 Feb 2008 13:00:35 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:ca34e0e6-57f3-480d-a5f3-ea247bafbf24</guid>
      <link>http://depth-first.com/articles/2008/02/04/raiding-chemistrys-data-tombs#comment-353</link>
    </item>
  </channel>
</rss>
