If You Want to Change the World, Build the Tool First - Part 1 4

Posted by Rich Apodaca Tue, 18 Dec 2007 12:19:00 GMT

Breakthroughs in technologies for managing and exchanging information always precede explosions in information exchange. From a safe distance, this principle seems completely obvious. Yet, like most obvious things, it's all too easy to forget in the heat of battle.

Recently, Peter Murray-Rust discussed the appalling state of data capture, dissemination, preservation and curation. His comments were prompted by an article written by Nico Adams. In it, Nico discusses his initial excitement by the publication of a large spectroscopic dataset, followed by his frustration in finding that the "data" really consisted of nothing more than flat images stored in PDF format.

The article in question is titled Preparation and Infrared/Raman Classification of 630 Spectroscopically Encoded Styrene Copolymers. Not having a subscription to the ASAP contents of this particular journal, I can only go by what appears in the abstract. From the abstract and title, it's clear that the dataset is the centerpiece of this article:

The barcoded resins (BCRs) were introduced recently as a platform for encoded combinatorial chemistry. One of the main challenges yet to be overcome is the demonstration that a large number of BCRs could be generated and classified with high confidence. Here, we describe the synthesis and classification of 630 polystyrene-based copolymers prepared from the combinatorial association of 15 spectroscopically active styrene monomers. Each of the 630 copolymers displayed a unique vibrational fingerprint (infrared and Raman), which was converted into a spectral vector. ...

Apparently, the technique enables polymer beads to be encoded with a spectroscopically-readable tag for use in identifying attached compounds at the end of a split-pool synthesis. Yet the supplementary material for the article consists of nothing more than static images like the one below:

For researchers hoping to build on the experiments described in the paper, and for those hoping to model or compile the results, static images like the one shown above are practically useless.

Why did this happen and why do incidents like it play out with bewildering regularity in chemistry?

Nico looks to scientists and publishers, whereas Peter focuses on the publishers as the root cause.

I understand the reasoning and share their concern about the problem, but I disagree about the cause.

The cause of this problem is neither the policies of publishers nor the lack of understanding of the problem by scientists - those are just symptoms. The root cause is a failure of cheminformatics itself. Simply put, cheminformatics has failed to deliver an inexpensive, robust, and truly usable solution to the problem of compiling, managing, and sharing spectral data for scientists of average computer skills.

The tool hasn't been built yet. No tool means that both scientists and publishers will continue to use the only tools they have any faith in, despite their obvious flaws. No tool leads to more of the same, from both scientists and publishers. No tool also means an enormous opportunity for the group that develops it.

Read Part 2 to find out why.

Image Credit: Neil T

Comments

Leave a response

  1. Geoffrey Hutchison Tue, 18 Dec 2007 23:55:02 GMT

    It's funny that you mentioned this problem with spectra. When we save to graphics, we usually end up killing the actual scientific data. So it's a real problem.

    But it's not as if there aren't some tools to do this sort of conversion. Sometimes the solution is on SourceForge already:

    http://spectrascan.sourceforge.net/

    Note, I haven't actually tried this program and it's not mine. But it looks like one tool for the job. Other commercial products also exist.

    Now perhaps in Part 2, you're going to explain your ideas for sharing spectral data. That's even better -- it doesn't require us to estimate data from the graphics. :-)

  2. Rich Apodaca Wed, 19 Dec 2007 09:43:28 GMT

    Geoff, you're right - raster graphics formats are also a problem when used with, for example, 2D chemical structures.

    Many fine pieces of software, open source and otherwise, have been written which handle various aspects of the problem I outlined for spectral data. They might be suitable for incorporating into the tool I'm describing, but they, by themselves, are not the tool.

  3. Anthony Lewis Wed, 09 Jan 2008 09:57:03 GMT

    The idea of SpectraScan was (is) appealing so I attempted to install it... unfortunately, so far as I can tell, it requires the "QT4" environment and installing that is beyond my capabilities. Oh well...

  4. Rich Apodaca Thu, 10 Jan 2008 07:23:22 GMT

    Anthony, good point. It just goes to show the many forms 'usability' can take.

Comments