1908 and All That: The Long Tail and Chemistry

Posted by Rich Apodaca Wed, 07 May 2008 14:37:00 GMT

Quite a few American Chemical Society (ACS) divisions are celebrating their 100th anniversaries this year. While this fact may at first glance seem like just a piece of nerdy trivia, Rudy Baum, Editor-in-chief of C&E News decided to dig deeper. And what he found was the Long Tail of chemistry, alive and well - in 1908.

In his editorial, Baum describes how he looked for the causes of the sudden appearance of so many ACS divisions in 1908. At its core, he found a growing realization on the part of influential chemists at the time that ACS membership was becoming too diverse in their interests and areas of specialization:

Specialization in subdisciplines of chemistry was also much on ACS members' minds in these years. Some members felt strongly that subdivisions of some sort should be created in the society to provide a venue for chemists from these areas to meet separate from the society as a whole. It was noted that chemists were going off and forming their own specialized organizations in areas like electrochemistry, biological chemistry, and agricultural chemistry.

As early as 1903, ACS established a committee of five distinguished members to look into this issue, with Massachusetts Institute of Technology's Arthur A. Noyes as the chairman. (Throughout its history, ACS has responded to challenges by creating committees!) The committee reported to the ACS Council at its June 1, 1903, meeting, and strongly recommended that "Divisions of the Society be established representing different important branches of chemistry."

For those familiar with the work of Chris Anderson, what's being described is nothing other than the Long Tail:

The theory of the Long Tail is that our culture and economy is increasingly shifting away from a focus on a relatively small number of "hits" (mainstream products and markets) at the head of the demand curve and toward a huge number of niches in the tail. As the costs of production and distribution fall, especially online, there is now less need to lump products and consumers into one-size-fits-all containers. In an era without the constraints of physical shelf space and other bottlenecks of distribution, narrowly-targeted goods and services can be as economically attractive as mainstream fare.

How much money does it cost to set up a new ACS division? Probably not that much. How big is the field of chemistry? Vast. Put the two together, and you have a recipe for today's ACS. A recent Depth-First article described this phenomenon. And C&E News itself maintains a (static?) blog on the Long Tail as it applies to chemical employment.

What does any of this have to do with chemical informatics? Although it may be tempting to think of chemists as a homogeneous group sharing a great deal of experience and knowledge, the proliferation of ACS divisions suggests otherwise. It seems reasonable to think that successful chemical information systems would do well to take this into account in their design and implementation.

ACS and the NIH Public Access Policy: Clarification at Last 4

Posted by Rich Apodaca Thu, 10 Apr 2008 14:27:00 GMT

An alert Depth-First reader pointed me to the new ACS policy for authors receiving NIH funding. The details are contained in a document outlining two ways authors can choose to comply with the new law requiring recipients of NIH funds to deposit a copy of their peer-reviewed manuscripts into PubMed Central. The choices are:

  1. Publish the article under ACS Author Choice by paying a fee. The ACS will then automatically deposit the article on behalf of the author.

  2. Publish the article using the standard procedure, but with the ACS granting authors the right (and responsibility) to deposit their manuscripts in compliance with the NIH Public Access Policy.

Under Option 2, copyright remains with the ACS - authors are simply granted an exception to enable them to comply with federal law. This means, among other things, that ACS retains the right to prevent third parties (including authors themselves) from creating derivative works of deposited manuscripts, and from redistributing them.

For better or worse, the federal government is now in the scientific publishing business. What remains to be seen is the extent to which this new publisher has the power and ability to deliver on the high expectations of many in the scientific community.

Wikipedia for Cheminformatics: A Simple Web API for Finding CAS Numbers in Compound Monographs 4

Posted by Rich Apodaca Wed, 02 Apr 2008 21:29:00 GMT

Good news for cheminformatics: Chemical Abstracts Service (CAS) has agreed to help Wikipedia users curate its collection of CAS numbers. As a result of the diligence of some hard-working volunteers, chemistry's most universal system for referring to chemicals can now be used far more effectively by the worlds biggest open repository of knowledge.

Wouldn't it be great to be able to pull these CAS numbers from Wikipedia programmatically?

Perspective

Estimates place the number of Wikipedia pages dealing with individual inorganic and organic substances in the thousands. (I'll use the term "compound monographs" to describe them.) One factor acting to keep this number low is poor visibility of these entries. Unlike most chemical databases, Wikipedia can't, by itself, be easily searched by structure. As chemically-aware tools for indexing Wikipedia begin to emerge, look for six things to happen:

  1. The number of Wikipedia compound monographs will increase significantly.
  2. The quality of monographs for intermediate- to well-known compounds will increase substantially.
  3. Demand for user-friendly interfaces to Wikipedia's chemical content will increase.
  4. Wikipedia users will become interested in storing and finding ever more diverse kinds of information about each compound.
  5. Bench chemists will start to include Wikipedia as one of their preferred literature search techniques, leading to...
  6. More creative tools for using the chemical content of Wikipedia.

As noted previously, it wasn't too long ago that indexing of the chemical literature was done solely by volunteers. Wikipedia offers an intriguing way to channel the innate drive for chemists to combine their own work and experience with that of others to build useful information tools for the community.

But for now we are left with the question of how to index the chemical content of Wikipedia. Although a few systems have been proposed, the only practical method is through the use of CAS numbers. Which brings us to the subject of today's tutorial.

A Quick CAS Number API for Wikipedia

The Ruby program below will accept the title of any Wikipedia compound monograph title and return the CAS number for the compound being discussed, or an error message if none was found:

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'cgi'

class Wikikemi
  @cas = nil

  attr_reader :cas

  def initialize title
    uri = URI.escape("http://en.wikipedia.org/wiki/#{title}")
    puts "loading... #{uri}"
    doc = Hpricot(open(uri))
    table = (doc/"table")[0]

    table.inner_html.match(/([0-9]{2,7}?\-[0-9]{2}\-[0-9])/) if table

    @cas = $1
  end
end

# Returns the CAS number present in the Wikipedia monograph with
# the indicated title, or an error message if none is found. Try, for example,
# "benzene.".
while true
  puts "Enter the title of the Wikipedia page, for example: 'benzene'"
  monograph_title = gets.chomp
  w = Wikikemi.new monograph_title
  puts w.cas ? "[#{w.cas}]" : "CAS number not found"
end

This program makes use of the excellent Ruby HTML parser, Hpricot.

Saving the above code to a file called wikikemi.rb, we can run it with:

$ ruby wikikemi.rb

For example, we can look up the CAS numbers for Ferrocene, Lipitor, or 1,2,3,4,4a,5,6,7,8,8a-Decahydronaphthalene:

$ ruby wikikemi.rb
Enter the title of the Wikipedia page, for example: 'benzene'
ferrocene
loading... http://en.wikipedia.org/wiki/ferrocene
[102-54-5]
Enter the title of the Wikipedia page, for example: 'benzene'
lipitor
loading... http://en.wikipedia.org/wiki/lipitor
[134523-00-5]
Enter the title of the Wikipedia page, for example: 'benzene'
1,2,3,4,4a,5,6,7,8,8a-Decahydronaphthalene
loading... http://en.wikipedia.org/wiki/1,2,3,4,4a,5,6,7,8,8a-Decahydronaphthalene
[91-17-8]

All this method requires is that the Wikipedia page lists the correct CAS number in its Drugbox or Chembox template. Fortunately, CAS has agreed to help make this happen.

Conclusions

A little Ruby code is all it takes to build a working CAS number lookup system using Wikipedia. Although this may be useful as a standalone tool, it becomes much more powerful when made part of a larger cheminformatics system. But that's a story for another time.

See also Antony Williams' announcement on CAS and Wikipedia.

NIH Hears Publisher Feedback on Open Access Mandate

Posted by Rich Apodaca Fri, 21 Mar 2008 22:17:00 GMT

The NIH heard public comments yesterday on its plans for implementing PL 110-161 Section 218, a new law that grants the agency broad powers to intervene in the scientific publication system.

Scientific publishers were out in force. According to The Scientist, Jack Ochs of the American Chemical Society (ACS) was first in line to offer comments:

He started out by saying that a brief meeting was no substitute for the formal comments on rulemaking process like the one the NIH held when they were implementing the voluntary submission program in 2005. He was the first of several to call a halt to implementing the mandate so the details could be worked out.

A lot is riding on the outcome. The new law requires NIH grant recipients to deposit peer-reviewed manuscripts of their publications into PubMed Central, in apparent opposition to the policies of many leading scientific publishers - including the ACS.

NIH has given its grant recipients until April 7 before compliance will become mandatory. It remains unclear what steps, if any, ACS will take to enable authors to comply.

Unless ACS policy changes, NIH grant recipients face the possibility of losing one of the most prestigious publication options in chemistry.

Also see Peter Suber's comments.

Crunch Time: Can NIH Grant Recipients Still Publish in ACS Journals? 3

Posted by Rich Apodaca Tue, 18 Mar 2008 14:34:00 GMT

A new law that introduces major changes in the way many U.S. scientific papers are published and redistributed is about to go into effect. Late last year, President Bush signed into law H.R. 2764 (now Public Law 110-161), part of which gives a broad new mandate to the NIH to intervene in the scientific publication system:

SEC. 218. The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine's PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.

A new NIH Public Access Website describes how the agency intends to implement the law. Recipients of NIH funds have two options for complying:

  1. If you choose to publish your article in certain journals, you need do nothing further to comply with the submission requirement of the Policy. See http://publicaccess.nih.gov/submit_process_journals.htm for a list of these journals.

  2. For any journal other than one of those in this list, the author must:

    a. Inform the journal that the article is subject to the Public Access Policy when submitting it for publication.

    b. Make sure that any copyright transfer or other publication agreement allows the article to be submitted to NIH in accordance with the Policy. For more information, see the FAQ Whose approval do I need to submit my article to PubMed Central? and consult with your Institution.

    c. Submit the article to NIH, upon acceptance for publication. See the Submission Process for more information.

The new policy becomes effective April 7, 2008.

In other words, all recipients of NIH funds will soon have an obligation under Federal Law to disclose to journals not on the NIH's list that their work is subject to PL 110-161.

The question is: what will the journals, some of which represent the most prestigious in their field, do with this information?

What Will the ACS Do?

For an organization making a lot of noise recently about its new Web site and focus on communication with its members, the ACS has been very quiet on what what position, if any, it will take regarding the new law.

In fact, from the ACS Homepage, one might get the impression nothing has changed. Looking at the home pages for flagship journals with a large amount of NIH-funded content provided no insights, either; J. Med. Chem, J. Org. Chem., and Org. Lett. have nothing to say on PL 110-161 that I could find.

The ACS author copyright release form doesn't appear to have changed. In other words, when you agree to publish your article in an ACS journal, you're still handing over copyright in your work to the ACS, who has the right under Copyright Law (and presumably PL 110-161) to prevent NIH grant recipients from depositing their manuscript into PubMed Central.

Even the ACS Office of Policy and Legislative & Government Affairs has zero guidance, as of this writing, to offer prospective authors who may have questions about complying with PL 110-161.

Misplaced Burden of Compliance

One of the many problems with PL 110-161 Section 218 is that it places the burden of compliance on authors themselves, not publishers. The law states very clearly that implementations must be "consistent with copyright law." As I wrote previously, this provision gives all the latitude needed to continue business as usual, which is exactly what we're seeing so far.

Two critical questions remain unanswered:

  • What obligation, if any, does the ACS have to reject manuscripts from NIH-funded authors, given that it remains ACS policy to take copyright from its authors and with it the right to deposit the accepted manuscript into PubMed Central?

  • What obligation, if any, do NIH-funded authors have to avoid publication in journals that strip copyright from them and thereby prevent their ability to comply with PL 110-161?

In partial answer to the second question, the NIH offers this FAQ:

Whose approval do I need to submit my article to PubMed Central?

Authors own the original copyrights to materials they write. Consistent with individual arrangements with authors' employing institutions, authors often transfer some or all of these rights to the publisher when the journal agrees to publish their article. Some publishers may ask authors to transfer copyrights for a manuscript when it is first submitted to a journal for review.

Authors should work with the publisher before any rights are transferred to ensure that all conditions of the NIH Public Access Policy can be met. Authors should avoid signing any agreements with publishers that do not allow the author to comply with the NIH Public Access Policy.

Federal employees always may submit their final peer-reviewed manuscript to PubMed Central, because government works are not subject to copyright protection in the United States.

But even here the language is garbled. Saying that an author "should avoid signing any agreements with publishers that do not allow the author to comply with the NIH Public Access Policy" is not the same as saying authors "shall not sign any agreements with publishers that do not allow the author to comply with the NIH Public Access Policy."

The former describes a suggestion; the latter describes a punishable offense.

Regardless of whether or not PL 110-161 is good public policy, far greater clarity will be needed from both the NIH and scientific publishers if the new law is to be enforced effectively.

Image Credit: wili_hybrid

Disclaimer: I am not a lawyer.

Older posts: 1 2 3