A New Beginning or More of the Same? 3
As discussed by Peter Suber, Peter Murray-Rust and others, President Bush signed H.R. 2764 into law yesterday. Among the many items in this bill is one that proponents argue could change the nature of the Open Access debate. Does this new law represent a fundamentally changed game, or just the next inning of the old one?
The text of the new law spells out what is now required:
SEC. 218. The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine's PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.
IANAL, but the provision requiring the policy to be implemented "in a manner consistent with copyright law" offers publishers (and scientists) all the flexibility they need to continue business as usual.
The reason is simple. Transfer of copyright from the author of a scientific paper to the publisher is usually one of the first things to happen "upon acceptance" of a manuscript for publication. And the new law makes it perfectly clear that copyright law takes precedence over deposition into PubMed Central.
Most of the journals in question will be hostile to the idea of having their copyrighted material deposited into PubMed Central and so understandably won't allow it to be done by the authors of papers or anyone else.
Take this hypothetical scenario for example: Professor Gross at California University gets his manuscript approved for publication in the Journal of Nanoscale Devices (JND). Professor Gross is fully aware both of HR 2764 and JND's refusal to deposit manuscripts into PubMed Central - the reasons why Professor Gross would choose JND anyway are interesting, but not relevant here. Along with the acceptance letter, JND requests prompt return of a signed copyright transfer agreement. Professor Gross sends in the signed form and from that point on, all rights to his article belong to JND. As is their policy, JND refuses Professor Gross permission to deposit a copy of his paper into PubMed Central within 12 months after publication.
Unless I'm missing something, neither Professor Gross nor JND have violated any laws. The assumption made by proponents of the new law seems to be that to implement the new policy, the Director of NIH will forbid publication by grant recipients in journals that don't allow deposition of articles into PubMed Central.
How many influential scientist do you know of who would tolerate the government telling them which journals they can and can't publish in? The minute such a misguided policy is put in place, the national scientific outcry would more than overwhelm anything Open Access proponents could muster.
Neither HR 2764 nor any form of government intervention will bring widespread Open Access into being. The only things that will change the status quo are: (1) the availability of tools for making it happen; and (2) the realization by individual investigators that continuing to give away their hard-earned copyright makes them far less competitive than their peers who don't.
Open Access proponents should forget about getting the Federal Government to fix the mess that modern scientific publication has become. Instead, they should focus on making Open Access-like options more attractive to scientists.
Image Credit: mayr
If You Want to Change the World, Build the Tool First - Part 2 2
Let's face it - real change is painful for most people. Think back, for example, to your last big change at work, and chances are pretty good that the experience was not entirely enjoyable - especially if the change was imposed on you.
As designers of tools, it's easy to forget just how unpleasant change is for your users. Being closely involved and invested in the development of your tool only makes it harder to empathize with the people whose routines you'll be interrupting.
When innovations fail to catch on, it may be tempting to explain the situation in terms of users not "getting it," or through the intervention of outside forces with their own agenda. But more often than not, the real problem results from the innovation failing to offer a reasonable promise of compensation for the inconvenience that change brings.
The previous article in this series, suggested that the same dynamic applied to the compilation, management, and sharing of spectral data by chemists. More to the point:
... cheminformatics has failed to deliver an inexpensive, robust, and truly usable solution to the problem of compiling, managing, and sharing spectral data for scientists of average computer skills. ...
To be sure, there are tools that address parts of the problem. But no solution addresses them all and that's why scientists and publishers resort to using obviously inferior solutions like PDFs. Let's take each of the requirements one at a time:
Inexpensive. One of the chronic problems in vertical markets like chemistry software is the lack of ubiquitous tools. Lack of ubiquity is a recipe for balkanization. Because chemistry software tends to be highly specialized and expensive to develop, suppliers must and do pass these costs onto customers. Change linked to money is especially hard to accept. The key, therefore, to developing the ideal tool is to relentlessly focus on keeping development cost low so as to deliver a low-cost (or free) tool. It's all but guaranteed that the ideal tool will take advantage of multiple pieces of Open Source software.
Robust. Few things are more difficult than trying to convince a skeptic to try a new, unreliable technology. Getting the last 20% in reliability is orders of magnitude more difficult than getting the first 80%. Part-way simply won't cut it.
Usable. A steep learning curve is a surefire deterrent to adoption. Chemistry has a long history of software with poor usability. Who could blame jaded users for turning away from "yet another piece of software." Make it obvious or don't make it at all. Tying the tool to a specific operating system or browser is an especially bad idea; "usable" means usable by everyone.
The ideal solution must also address the three key needs chemists have with respect to using their spectra:
Compile Spectra Contrary to an apparently popular belief among non-experimental chemists, most experimental chemists create their own spectra. There may be a "spectroscopist" who handles unusual cases, but the vast majority of spectra are created and interpreted by the chemist. They need a tool that requires no thought or planning to get a spectrum from the instrument into a database and ultimately onto their desktop.
Manage Spectra During any given year, an organic chemist of average productivity can generate hundreds of spectra. It's a safe assumption today that these will be in digital format. The volume of data creates its own set of problems: where to store the spectra, how to store them, how to find them again, and how to manipulate them once they are found. Tagging the spectra in such a way that the sample history can be reconstructed is critical.
Share Spectra One of the primary channels for sharing spectral data is through scientific publication. The tool must offer an obvious solution for scientists to compile their data into packages that publishers can work with and readers can do something with.
The analogy that springs to mind is blogging. As early as 1994, blogging was technically possible - all the pieces were in place and the demand for online content was mushrooming. But why didn't it happen? There was no tool that actually made it cheap and easy to blog. Staring in 2000-2001, those tools started to appear. Today, we take it for granted that anyone who wants to publish their own writing can do so almost immediately.
The availability of the tool did what years of discussion failed to do; it changed behavior. It succeeded by offering a reward that more than compensated for the pain of change.
The development of a ubiquitous tool for spectral data compilation, management, and sharing is an opportunity with a potentially big reward for the group that gets it right. It's one of those uninteresting, widespread problems that creates a natural scarcity of good solutions and people willing to develop them. Most players in the field have concluded (prematurely) that the solution(s) already exists, and so are reluctant to get involved.
What more could you ask for as a developer?
Image Credit: Daniel Morris
If You Want to Change the World, Build the Tool First - Part 1 4
Breakthroughs in technologies for managing and exchanging information always precede explosions in information exchange. From a safe distance, this principle seems completely obvious. Yet, like most obvious things, it's all too easy to forget in the heat of battle.
Recently, Peter Murray-Rust discussed the appalling state of data capture, dissemination, preservation and curation. His comments were prompted by an article written by Nico Adams. In it, Nico discusses his initial excitement by the publication of a large spectroscopic dataset, followed by his frustration in finding that the "data" really consisted of nothing more than flat images stored in PDF format.
The article in question is titled Preparation and Infrared/Raman Classification of 630 Spectroscopically Encoded Styrene Copolymers. Not having a subscription to the ASAP contents of this particular journal, I can only go by what appears in the abstract. From the abstract and title, it's clear that the dataset is the centerpiece of this article:
The barcoded resins (BCRs) were introduced recently as a platform for encoded combinatorial chemistry. One of the main challenges yet to be overcome is the demonstration that a large number of BCRs could be generated and classified with high confidence. Here, we describe the synthesis and classification of 630 polystyrene-based copolymers prepared from the combinatorial association of 15 spectroscopically active styrene monomers. Each of the 630 copolymers displayed a unique vibrational fingerprint (infrared and Raman), which was converted into a spectral vector. ...
Apparently, the technique enables polymer beads to be encoded with a spectroscopically-readable tag for use in identifying attached compounds at the end of a split-pool synthesis. Yet the supplementary material for the article consists of nothing more than static images like the one below:

For researchers hoping to build on the experiments described in the paper, and for those hoping to model or compile the results, static images like the one shown above are practically useless.
Why did this happen and why do incidents like it play out with bewildering regularity in chemistry?
Nico looks to scientists and publishers, whereas Peter focuses on the publishers as the root cause.
I understand the reasoning and share their concern about the problem, but I disagree about the cause.
The cause of this problem is neither the policies of publishers nor the lack of understanding of the problem by scientists - those are just symptoms. The root cause is a failure of cheminformatics itself. Simply put, cheminformatics has failed to deliver an inexpensive, robust, and truly usable solution to the problem of compiling, managing, and sharing spectral data for scientists of average computer skills.
The tool hasn't been built yet. No tool means that both scientists and publishers will continue to use the only tools they have any faith in, despite their obvious flaws. No tool leads to more of the same, from both scientists and publishers. No tool also means an enormous opportunity for the group that develops it.
Read Part 2 to find out why.
Image Credit: Neil T

