PubChem is a Platform
Two recent J. Chem. Inf. Model. articles support the idea that PubChem is rapidly evolving into a Chemical Informatics platform:
Large-Scale Annotation of Small-Molecule Libraries Using Public Databases. Using PubChem and other databases, the authors categorize the level of annotation (data, metadata, and links) of free chemical databases, with PubChem as the centerpiece. The work is part of a larger effort designed to integrate this free resource into the Novartis Research Foundation (GNF) workflow.
Web Service Infrastructure for Chemoinformatics. Among other interesting initiatives, the article describes a desktop application front-end for PubChem. (As a bonus, the authors also make the case).
Platforms are essential because they focus the attention and effort of self-interested third-parties around a common goal. They become so integrated into society that they eventually become invisible. There is outrage when they stop working. Think of highways, sewers, phone lines, communications satellites, the patent system, and the Internet, among others. We don't just use these services, we build on top of them.
Chemical Abstract Service is an important tool for many, but it is not a platform. By placing high costs on access to its service and severely restricting its use, the ACS has effectively shut out anyone wanting to build another service on top of CAS. Clearly this was part of the plan. Small and large third-party players alike are shut out, with the inevitable chilling effect on innovation.
Contrast this situation with PubChem. The public is free to download and re-use the entire database of molecules and associated data. PubChem has recently unveiled a new Web API called PUG that will make it even easier to layer on additional functionality. These kinds of capabilities create an entirely different dynamic: witness both eMolecules and ChemSpider, two services that unashamedly exploit the PubChem resource. Expect to see more of this in the months ahead.
Remember the Apple II? This product became so successful that it played a major role in undermining dozens of highly profitable and well-established businesses. Why was it so successful? One of the key reasons was its open architecture, compared to what had preceded it. Within a very short time, third parties had developed a large number of innovative products that exploited the underlying platform - both with and without Apple's encouragement. One of those products, VisiCalc was so successful that at one point many buyers of Apple's machine did so for no other purpose than to run it.
Whether PubChem itself ends up becoming the standard cheminformatics platform is hard to say. Perhaps this role will be filled by a system not yet built, or which evolves from PubChem. Whatever the outcome, PubChem has unmasked a deep need (and opportunity) for an open cheminformatics platform. As Apple's experience demonstrates, often you get more in the end by giving something up.
Simple CAS Number Lookup with PubChem 2
CAS Registry Numbers simplify the thorny problem of referring to chemical substances. These short numerical sequences are arguably the most widely-used form of molecular identifier, appearing on reagent bottles, in publications, in patents and patent applications, and MSDS sheets.
During my time as a synthetic organic chemist, I would sometimes run into the problem of finding the structure of a molecule represented by a CAS number. A common case was when an ambiguous, incomprehensible, or blurred IUPAC name was printed on a reagent bottle along with a CAS number. By looking up the CAS number, I could confirm the bottle's contents.
Your first impulse when looking up a CAS number might be to fire up SciFinder. For years this was the only option. Those days are quickly starting to seem as quaint as when people actually wrote on pieces of paper and dropped them in mailboxes (dropping DVDs in a mailbox is a different matter).
A little-publicized feature of PubChem makes it an ideal way to quickly find the structure associated with a CAS Number. To use it, you need nothing more than a computer, a browser, and an internet connection.
Browse over to the PubChem welcome page. At the top you'll find a search box. Enter your CAS number and press "Go." For this example, I'm using the CAS number for 2,5-Pyrazinedicarboxylic acid dihydrate:

If all goes well, you should see a results screen containing the structure of your compound and a link to its summary page:

Does this seem a little too good to be true? Try it for yourself. Pick up a copy of the Aldrich catalog, Merck index, or anything else that lists lots of CAS numbers. Choose several structures at random and see how PubChem performs.
There are limitations to this method. PubChem generally doesn't index large molecules such as polymers and peptides, so they won't be found by this method. Similarly, if a CAS number doesn't point to a distinct molecular entity (e.g. "mineral oil"), PubChem won't find it either. But these are hardly limitations in the vast majority of cases.
With the recent addition of Sigma-Aldrich as a PubChem compound supplier, it won't be long before smaller companies begin following suit. What we're seeing with PubChem is a classic example of a network effect. The end result should come as a surprise to nobody.
Update: Chempedia offers a more detailed CAS Number Lookup service.
SciLink: Science Meets Facebook
Whether you're in academics or industry, a big part of being a scientist is getting your work noticed by other scientists. In years past, scientists relied on subscription-only services such as Chemical Abstracts Service and Science Citation Index. But these services are starting to show their limitations, particularly with respect to being able to sell their products at an affordable price. Can you afford to trust this important part of your scientific career to companies with broken business models?
Enter SciLink, a service that could end up doing for science what Facebook has done for college campuses. Search for a scientist's name and get a (partial) list of their publications along with a list of other scientists they've worked with as co-authors. No small database, SciLink already contains information on 5.8 million scientists. Although you can create your own user profile after registering, you're probably already in SciLink courtesy of its creative use of PubMed bibliographical information.
It's not clear what the future holds for SciLink. One thing is certain: free services like SciLink can be expected to proliferate over the next few years as cost squeezes and technological advances continue to take their toll on the established scientific information industry.
From Famine to Feast: A Bumper Crop of Free Chemistry Databases
"Until PubChem came on the scene, the state of chemoinformatics compared to bioinformatics was 20 years behind," says Christopher Lipinski, who formulated the eponymous rule-of-five criteria for drug bioavailability.
-Monya Baker, Nature Reviews Drug Discovery
The number of free chemistry databases on the Web just keeps growing. A recent Depth-First article discussed twelve of them. It turns out that Chembiogrid from Indiana University maintains a list of forty free chemistry databases, most of which are alive and well.
As this trend continues, the need for database standards will become painfully obvious. Not only will interoperable infrastructure technologies and user interface standards need to be developed, but thorny intellectual property issues including access, chain of title, and digital rights will need to be resolved. However, the most immediate need is much more down-to-earth: to involve chemists with the growing number of free alternatives to the chemical information monopoly they've come to rely on.
Chemical Reviews on Wikipedia

Until 1966, Chemical Abstracts Service used volunteers exclusively to abstract the chemical literature. At the system's peak, thousands of scientists were willing and even enthusiastic to perform this tedious, demanding work for very little pay. The system was eventually phased out in favor of the professional abstracting service that replaced it.
What motivated these volunteer abstracters? Enlightened self-interest probably played a role. After all, preparing a set of abstracts in a field you do research in can pay off in your own increased productivity. It's also a good way to stay current with the literature, something you would do anyway. If your abstracts help your fellow scientist at the same time, so much the better. Another motivation could have been a simple desire to create order out of chaos, not unlike the many social networking activities flourishing on the internet today. Christoph Steinbeck will be giving a talk at the Fall 2006 ACS touching on this theme, and it's likely others will too as the field gathers momentum.
In browsing Dylan Stiles' blog, I came across an entry on the aldehyde->alkyne homologation. In it, Stiles cited a brief, but informative Wikipedia review on this reaction.
Surely this couldn't be the only example of online volunteer-created reviews in chemistry on Wikipedia. A quick search resulted in numerous examples:
- Wittig Reaction
- Grignard Reaction
- Sharpless Dihydroxylation
- Diels-Alder Reaction
- Thermite Reaction
- Danishefsky Taxol Total Synthesis
- Olefin Metathesis
- McMurry Coupling
- Robinson Annulation
- Swern Oxidation
- Cholesterol
The proliferation of this kind of volunteer, peer-reviewed chemical documentation is similar in spirit to that used by CAS in earlier times, although the technology couldn't be more different. Of course, this approach is not without its limitations and potential pitfalls, but it is remarkably self-correcting. This emerging system offers something that CAS will never be able to provide - involvement in, and ownership of, the documentation process itself.
Unfortunately, chemical informatics technologies have not kept up with internet technologies and the people currently using them. The reliance on raster images of 2-D structures, and the lack of a reliable web-enabled chemical indexing system both loom especially large as future problems to be addressed. What tools does this new kind of chemical publishing need to become more effective and efficient? How can these tools be made as invisible as possible?

