Cheminformatics in the Popular Press: The Long Tail of Structural Scaffolds
A recent issue of Wired is running a story about a Chemical Abstracts Service (CAS) study on the distribution of scaffold frequencies in the CAS Registry database.
Cheminformatics doesn't often make it into the popular press (or any other kind of press for that matter), so the Wired article is remarkable for that aspect alone.
From the original work (free PDF here):
It seems plausible to expect that the more often a framework has been used as the basis for a compound, the more likely it is to be used in another compound. If many compounds derived from a framework have already been synthesized, these derivatives can serve as a pool of potential starting materials for further syntheses. The availability of published schemes for making these derivatves, or the existence of these desrivates as commercial chemicals, would then facilitate the construction of more compounds based on the same framework. Of course, not all frameworks are equally likely to become the focus of a high degree of synthetic activity. Some frameworks are intrinsically more interesting than others due to their functional importance (e.g., as a building blocks in drug design), and this interest will stimulate the synthesis of derivatives. Once this synthetic activity is initiated, it may be amplified over time by a rich-get-richer process.
With the appearance of dozens of chemical databases and services on the Web in the last couple of years, the opportunities for analyses like this (and many others) can only increase. Who knows what we'll find.
Credit: thanks to Steve W for the Wired reference.
1908 and All That: The Long Tail and Chemistry
Quite a few American Chemical Society (ACS) divisions are celebrating their 100th anniversaries this year. While this fact may at first glance seem like just a piece of nerdy trivia, Rudy Baum, Editor-in-chief of C&E News decided to dig deeper. And what he found was the Long Tail of chemistry, alive and well - in 1908.
In his editorial, Baum describes how he looked for the causes of the sudden appearance of so many ACS divisions in 1908. At its core, he found a growing realization on the part of influential chemists at the time that ACS membership was becoming too diverse in their interests and areas of specialization:
Specialization in subdisciplines of chemistry was also much on ACS members' minds in these years. Some members felt strongly that subdivisions of some sort should be created in the society to provide a venue for chemists from these areas to meet separate from the society as a whole. It was noted that chemists were going off and forming their own specialized organizations in areas like electrochemistry, biological chemistry, and agricultural chemistry.
As early as 1903, ACS established a committee of five distinguished members to look into this issue, with Massachusetts Institute of Technology's Arthur A. Noyes as the chairman. (Throughout its history, ACS has responded to challenges by creating committees!) The committee reported to the ACS Council at its June 1, 1903, meeting, and strongly recommended that "Divisions of the Society be established representing different important branches of chemistry."
For those familiar with the work of Chris Anderson, what's being described is nothing other than the Long Tail:
The theory of the Long Tail is that our culture and economy is increasingly shifting away from a focus on a relatively small number of "hits" (mainstream products and markets) at the head of the demand curve and toward a huge number of niches in the tail. As the costs of production and distribution fall, especially online, there is now less need to lump products and consumers into one-size-fits-all containers. In an era without the constraints of physical shelf space and other bottlenecks of distribution, narrowly-targeted goods and services can be as economically attractive as mainstream fare.
How much money does it cost to set up a new ACS division? Probably not that much. How big is the field of chemistry? Vast. Put the two together, and you have a recipe for today's ACS. A recent Depth-First article described this phenomenon. And C&E News itself maintains a (static?) blog on the Long Tail as it applies to chemical employment.
What does any of this have to do with chemical informatics? Although it may be tempting to think of chemists as a homogeneous group sharing a great deal of experience and knowledge, the proliferation of ACS divisions suggests otherwise. It seems reasonable to think that successful chemical information systems would do well to take this into account in their design and implementation.
The Quiet Revolution in Scientific Peer-Review: An Introduction to Research Blogging 8
A quiet revolution is taking place in the way the primary research literature gets reviewed. Like all revolutions in their infancy, this one looks hungry, raggedy and generally not respectable. But that could change rather quickly given the right technology.
Research Blogging is a brand new service that aggregates commentary about the peer-reviewed literature appearing on blogs.
Let's say Mary the Chemist finds a procedure in a paper on reductive amination that solves a problem she's been having in isolating her products. After having used the procedure awhile, she notices that one class of substrate not described in the original paper gives much lower yields than those reported. Not having the resources to create a complete paper around her observation, she decides to write about what she found and post it to her blog.
If that were the end of the story, it's very unlikely Mary's posting would be of much use. Although Mary's blog is read by a couple of hundred people daily, few of the readers on the day her posting appeared had an interest in reductive amination or the paper she discussed. And none of her readers on that day were able to follow up on her observation.
Mary continues to post to her blog and eventually her observation, of potentially great value to the right chemist, gets buried in the archives (and on page 3 or 4 of most Google searches).
Enter Research Blogging, a Web-based database associating blog entries with references to the peer-reviewed scientific literature. Some time before writing about her observations, Mary signed up for a Research Blogging account and registered her blog with the service. At the time she wrote her observations on the reductive amination reaction, Mary applied special markup to the posting to make it readable by Research Blogging's automated system.
Instead of disappearing into the digital abyss, Mary's observation becomes permanently associated with the original paper.
Although Research Blogging's user interface is currently primitive, it's unlikely to remain so for long. The founders of the service appear both motivated and committed, recently forming a non-profit corporation to support their work.
In the future it's not inconceivable that Barry the Chemist, after having finished doing his CAS search on reductive amination methods would next turn to Research Blogging to make sure he really knows everything written about the three most promising peer-reviewed papers he's considering using.
Research Blogging is a wonderful idea with great potential to fill a significant need. Like any new technology, though, there are some issues to work out. The next article in this series will offer some ideas.
The Long Tail and Chemistry: Why So Many ACS Meeting Talks are "Uninteresting"
The Boston ACS provided yet another opportunity to look at chemistry as a social networking phenomenon. Having attended several talks inside my areas of expertise (organic chemistry, medicinal chemistry, and chemical informatics), I was struck by two things:
Most talks were laser-focused on one tiny aspect of chemistry that is of little interest to the average chemist, but of great interest to a few chemists.
Those talks that were not as focused on details drew the biggest crowds.
These statements have nothing to do with the quality of the presentations. In fact, one of the best talks focused on the clinical trial data for a single molecule, the Type II diabetes treatment dapagliflozin (below). Although the members of the audience for this talk seemed interested as well, they represented only a tiny fraction of the ACS attendees.

Roald Hoffman's talk (and the symposium of which it was a part) drew a larger audience. Having won a Nobel Prize surely can't hurt. An association with recent controversy is also a plus. Of course, being a good story teller and genuinely likable also helps. On the other hand, I wonder what the turnout would have been like if instead of telling his scientific life story Hoffman had presented the details of a recent theoretical study.
The Boston ACS, and just about any analysis of printed chemical research reveals The Long Tail at every turn. Although usually applied to mass markets such as DVD rentals through Netflix, The Long Tail also provides valuable insights into scientific fields such as chemistry.
To use Long Tail terminology, Roald Hoffman and E.J. Corey are at the head of the curve - the blockbusters. They and their work are widely-recognized and discussed. Almost everyone else's work, regardless of how ground-breaking or clever, lies in the long tail of relatively obscurity. It is of great interest to a handful of people but essentially invisible to most chemists.
In a few ACS sessions, I counted as few as four or five audience members. The large number of ACS divisions and the astonishingly small audiences at some of their presentations are nothing more than a concrete demonstration of the Long Tail at work.
Each ACS division is a microcosm of the ACS itself, complete with it's own curve containing a few blockbusters (who are essentially unknown outside of the division) and everyone else in the Long Tail.
Not surprisingly, the collection and distribution of chemical information reflects the Long Tail character of chemistry itself. This simple but powerful principle has rather important consequences for chemists of all stripes, be they information consumers or information producers.
image credit: silver marquis
Everything is Miscellaneous
It turns out the world was a lot messier than it seemed - and it's about time. I couldn't help but be reminded of The Long Tail as I watched this presentation by David Weinberger.
Thanks to Richard Cameron of CiteULike for the link.
Older posts: 1 2

