1908 and All That: The Long Tail and Chemistry

Posted by Rich Apodaca Wed, 07 May 2008 10:37:00 GMT

Quite a few American Chemical Society (ACS) divisions are celebrating their 100th anniversaries this year. While this fact may at first glance seem like just a piece of nerdy trivia, Rudy Baum, Editor-in-chief of C&E News decided to dig deeper. And what he found was the Long Tail of chemistry, alive and well - in 1908.

In his editorial, Baum describes how he looked for the causes of the sudden appearance of so many ACS divisions in 1908. At its core, he found a growing realization on the part of influential chemists at the time that ACS membership was becoming too diverse in their interests and areas of specialization:

Specialization in subdisciplines of chemistry was also much on ACS members' minds in these years. Some members felt strongly that subdivisions of some sort should be created in the society to provide a venue for chemists from these areas to meet separate from the society as a whole. It was noted that chemists were going off and forming their own specialized organizations in areas like electrochemistry, biological chemistry, and agricultural chemistry.

As early as 1903, ACS established a committee of five distinguished members to look into this issue, with Massachusetts Institute of Technology's Arthur A. Noyes as the chairman. (Throughout its history, ACS has responded to challenges by creating committees!) The committee reported to the ACS Council at its June 1, 1903, meeting, and strongly recommended that "Divisions of the Society be established representing different important branches of chemistry."

For those familiar with the work of Chris Anderson, what's being described is nothing other than the Long Tail:

The theory of the Long Tail is that our culture and economy is increasingly shifting away from a focus on a relatively small number of "hits" (mainstream products and markets) at the head of the demand curve and toward a huge number of niches in the tail. As the costs of production and distribution fall, especially online, there is now less need to lump products and consumers into one-size-fits-all containers. In an era without the constraints of physical shelf space and other bottlenecks of distribution, narrowly-targeted goods and services can be as economically attractive as mainstream fare.

How much money does it cost to set up a new ACS division? Probably not that much. How big is the field of chemistry? Vast. Put the two together, and you have a recipe for today's ACS. A recent Depth-First article described this phenomenon. And C&E News itself maintains a (static?) blog on the Long Tail as it applies to chemical employment.

What does any of this have to do with chemical informatics? Although it may be tempting to think of chemists as a homogeneous group sharing a great deal of experience and knowledge, the proliferation of ACS divisions suggests otherwise. It seems reasonable to think that successful chemical information systems would do well to take this into account in their design and implementation.

The Quiet Revolution in Scientific Peer-Review: An Introduction to Research Blogging 8

Posted by Rich Apodaca Wed, 30 Jan 2008 10:37:00 GMT

A quiet revolution is taking place in the way the primary research literature gets reviewed. Like all revolutions in their infancy, this one looks hungry, raggedy and generally not respectable. But that could change rather quickly given the right technology.

Research Blogging is a brand new service that aggregates commentary about the peer-reviewed literature appearing on blogs.

Let's say Mary the Chemist finds a procedure in a paper on reductive amination that solves a problem she's been having in isolating her products. After having used the procedure awhile, she notices that one class of substrate not described in the original paper gives much lower yields than those reported. Not having the resources to create a complete paper around her observation, she decides to write about what she found and post it to her blog.

If that were the end of the story, it's very unlikely Mary's posting would be of much use. Although Mary's blog is read by a couple of hundred people daily, few of the readers on the day her posting appeared had an interest in reductive amination or the paper she discussed. And none of her readers on that day were able to follow up on her observation.

Mary continues to post to her blog and eventually her observation, of potentially great value to the right chemist, gets buried in the archives (and on page 3 or 4 of most Google searches).

Enter Research Blogging, a Web-based database associating blog entries with references to the peer-reviewed scientific literature. Some time before writing about her observations, Mary signed up for a Research Blogging account and registered her blog with the service. At the time she wrote her observations on the reductive amination reaction, Mary applied special markup to the posting to make it readable by Research Blogging's automated system.

Instead of disappearing into the digital abyss, Mary's observation becomes permanently associated with the original paper.

Although Research Blogging's user interface is currently primitive, it's unlikely to remain so for long. The founders of the service appear both motivated and committed, recently forming a non-profit corporation to support their work.

In the future it's not inconceivable that Barry the Chemist, after having finished doing his CAS search on reductive amination methods would next turn to Research Blogging to make sure he really knows everything written about the three most promising peer-reviewed papers he's considering using.

Research Blogging is a wonderful idea with great potential to fill a significant need. Like any new technology, though, there are some issues to work out. The next article in this series will offer some ideas.

The Long Tail and Chemistry: Why So Many ACS Meeting Talks are "Uninteresting"

Posted by Rich Apodaca Mon, 27 Aug 2007 11:32:00 GMT

The Boston ACS provided yet another opportunity to look at chemistry as a social networking phenomenon. Having attended several talks inside my areas of expertise (organic chemistry, medicinal chemistry, and chemical informatics), I was struck by two things:

  1. Most talks were laser-focused on one tiny aspect of chemistry that is of little interest to the average chemist, but of great interest to a few chemists.

  2. Those talks that were not as focused on details drew the biggest crowds.

These statements have nothing to do with the quality of the presentations. In fact, one of the best talks focused on the clinical trial data for a single molecule, the Type II diabetes treatment dapagliflozin (below). Although the members of the audience for this talk seemed interested as well, they represented only a tiny fraction of the ACS attendees.

Roald Hoffman's talk (and the symposium of which it was a part) drew a larger audience. Having won a Nobel Prize surely can't hurt. An association with recent controversy is also a plus. Of course, being a good story teller and genuinely likable also helps. On the other hand, I wonder what the turnout would have been like if instead of telling his scientific life story Hoffman had presented the details of a recent theoretical study.

The Boston ACS, and just about any analysis of printed chemical research reveals The Long Tail at every turn. Although usually applied to mass markets such as DVD rentals through Netflix, The Long Tail also provides valuable insights into scientific fields such as chemistry.

To use Long Tail terminology, Roald Hoffman and E.J. Corey are at the head of the curve - the blockbusters. They and their work are widely-recognized and discussed. Almost everyone else's work, regardless of how ground-breaking or clever, lies in the long tail of relatively obscurity. It is of great interest to a handful of people but essentially invisible to most chemists.

In a few ACS sessions, I counted as few as four or five audience members. The large number of ACS divisions and the astonishingly small audiences at some of their presentations are nothing more than a concrete demonstration of the Long Tail at work.

Each ACS division is a microcosm of the ACS itself, complete with it's own curve containing a few blockbusters (who are essentially unknown outside of the division) and everyone else in the Long Tail.

Not surprisingly, the collection and distribution of chemical information reflects the Long Tail character of chemistry itself. This simple but powerful principle has rather important consequences for chemists of all stripes, be they information consumers or information producers.

image credit: silver marquis

Everything is Miscellaneous

Posted by Rich Apodaca Wed, 20 Jun 2007 08:02:00 GMT

It turns out the world was a lot messier than it seemed - and it's about time. I couldn't help but be reminded of The Long Tail as I watched this presentation by David Weinberger.

Thanks to Richard Cameron of CiteULike for the link.

Chemical Nomenclature Translation

Posted by Rich Apodaca Sun, 10 Sep 2006 15:15:00 GMT

... We report here the development of a computer program for converting chemical names into connection tables, a process we call "nomenclature translation." ... this process provides an alternate method of structure registration by allowing a new substance to be input via a structurally descriptive systematic name instead of only as a connection table taken from a structural diagram.

-G.G.V. Stouw et al. J. Chem. Doc. 1974, 14, 185-193

Systematic nomenclature is one of the oldest forms of line notation. As a result, it can be found widely in papers, patents, spreadsheets, and other documents. Any software that can convert systematic nomenclature, such as IUPAC names, into a computer-based representational system, such as a connection table, has the potential to unlock vast amounts of legacy chemical information by making it structure-searchable.

Stouw and his group at Chemical Abstracts Service (CAS) developed the first working system for name to structure conversion. Their interest in an automated process stemmed from the potential to greatly accelerate the rate at which the chemical literature could be indexed. Instead of a human creating a computer representation by manually parsing a systematic name from a paper, a computer could do it error-free at a fraction of the cost. These factors are still at work today, although the pool of raw chemical information material has increased exponentially since 1974.

Nomenclature translation has been more widely investigated than the related problem of 2-D raster image interpretation, although the driving forces in both cases are the same. There are, of course, several proprietary packages for nomenclature translation. An important disadvantage of all of them is a distinct lack of customizability.

Open source nomenclature translation options have been very limited. One of the first such packages was ChemNomParse by David Robinson, Bhupinder Sandhu, and Stephen Tomkinson at the University of Manchester. ChemNomParse has since been made part of the Chemistry Development Kit (CDK). Although its capabilities are relatively limited, ChemNomParse is very useful for the design it embodies.

More recently, Peter Corbet at Cambridge has developed a package called OPSIN. Egon Willighagen wrote about integrating OPSIN into the desktop software package Bioclipse. OPSIN's source can be found in the project's SVN repository.

The most exciting potential for chemical nomenclature translation is realized when this capability is blended with other chemical informatics technologies. Future articles in this series will show how ChemNomParse and OPSIN can be used with other open source tools to create rich chemical informatics systems.

Older posts: 1 2