BioRails
BioRails is an Open Source Biological Information Management System (BIMS) based on Ruby on Rails. The BioRails team expects the first release to be available sometime in December 2006. From the BioRails FAQ:
A BIMs is a Biological Information Management System. BIMS are used to support the process of discovery research. The objective of a BIMS is to to persist the biological results in a searchable database that will support both quality control of the data being generated and, when integrated with chemistry systems, support the process of compound progression decision-making.
BioRails has a great deal of potential, and it will be interesting to see the system up close when it is released. Many areas of science, particularly chemistry, have been slow to embrace the Web as the powerful application development platform that it is. BioRails and projects like it have the potential to change not only the way that scientific software is deployed, but the nature of scientific collaboration itself.
Mashups for Fun and Profit
ProgrammableWeb offers one-stop shopping for all things mashup-related. If you've ever wanted to try your hand at Web programming, this site makes an excellent first stop. Be sure to check out the listing of over 1,000 mashup sites indexed by category and API.
The move toward open, Web-based chemical information resources is fully underway. The genie has been let out of the bottle, and there's no putting him back. This is bad news for large, established chemical information players. Their business models based on restricting information flow will be irreversibly disrupted. It's good news for tens of thousands of researchers who will be able to exploit chemical information in ways unimaginable today. Leading the way will be mashups that creatively tie diverse Web resources together, and dynamic programming languages like Ruby that make doing so easy.
Are you ready for the future?
Hacking PubChem: Entrez Programming Utilities
A recent article poses the question of how to balance the rights of owners of open chemical information resources against those of their users, while promoting an innovative environment for third-party developers. Although PubChem was the focus, the discussion could apply to any other chemical information resource. A reasonable approach would be to provide two separate entry points: one for Web browsers and another for various types of semi-autonomous software used in hacking and mashups.
Peter Corbett writes to point out that the Entrez Programming Utilities can be used to query PubChem and other databases under the NIH/NCI/NCBI umbrella. A separate developer server processes requests, and the terms of its use are fairly well stated. Future articles will explore the possibility of building some simple Ruby APIs for this developer PubChem entry point.
The Chemically-Aware Web: Are We There Yet?
Recently, I wrote a tutorial on embedding 2-D molecular renderings into webpages as Scalable Vector Graphics (SVG). This tutorial also contained a small experiment on the current chemical informatics capabilities of the Web.
Here is a scenario from the near future: Joe is writing a review on Cephalosporin C that he wants to publish the modern way - directly to the Web. An entirely new concept in scientific publishing has started to take hold. Rather than submitting scientific articles to publishers, who then make hamburger out of them and strip authors of their rights to reproduce their own work, a new system in which journals simply aggregate content already on the Web is gaining momentum. Some journals specialize in only including the very best scientific Web content available, and so enjoy a prestige factor. It's still a peer review system, but with inversion of control. The trick for scientists is getting their work indexed, and so noticed, in the first place.
Joe just downloaded a new 2-D structure editor, FooChemPaint, that he heard can make the structure drawings in his review structure-searchable. Every chemist he knows is talking about a new free search engine called Haystac (Haystac Ain't Chmoogle) that lets them substructure-search the web. For some reason, you need to create your structures using FooChemPaint if you want your own documents to be included in the search results.
After Joe finishes drawing Cephalosporin C with FooChemPaint, he chooses the File->Save As... menu item. Instead of saving as a JPG or PNG like he's done with other software, he saves the image as SVG. He then embeds the SVG into his review using a procedure similar to the one I outlined previously.
From Joe's perspective, he hasn't done anything very new. But unknown to Joe, FooChemPaint has automatically inserted the InChI identifier of Cephalosporin C as metadata into his SVG document. This enables ordinary search engines such as Google to associate the InChI with his SVG. The best part is that the entire process is essentially invisible to Joe.
Haystac is a web application that presents users with an online structure editor for preparing molecular queries. When a structure query is submitted, Haystac searches its molecular database for matches. This database, in turn, was built by a web spider specifically designed to look for InChI identifiers, maybe with the help of Google's Web API. One of Haystac's records for the structure of Cephalosporin C points to Joe's review article.
Science fiction? Maybe. This is where the experiment comes in. Before I submitted the article on SVG, I manually annotated the SVG of Alprazolam with the corresponding InChI. The XML source can be viewed in Firefox by right-clicking on the SVG image and choosing This Frame->View Frame Source, or alternatively here. Below is a fragment of the XML:
<svg ...>
<rdf:RDF
xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs = "http://www.w3.org/2000/01/rdf-schema#"
xmlns:dc = "http://purl.org/dc/elements/1.1/" >
<rdf:Description about="http://depth-first.com"
dc:title="InChI=1/C17H13ClN4/c1-11-20-21-16-10-19-17(12-5-3-2-4-6-12)14-9-13(18)7-8-15(14)22(11)16/h2-9H,10H2,1H3"
dc:format="image/svg+xml"
dc:language="en" >
<dc:creator>
<rdf:Bag>
<rdf:li>Richard L. Apodaca</rdf:li>
</rdf:Bag>
</dc:creator>
</rdf:Description>
</rdf:RDF>
<!-- etc. -->
</svg>Today I searched for the title of my article in Google and found it. I then searched for the InChI in the SVG metadata and did not find it. Currently, a search of this InChI shows only one hit from the DrugBank database.
The experiment failed in its stated goal of getting the InChI of Alprazolam indexed by Google via the metadata in its SVG rendering. Was it the formatting of my RDF tags? Is metadata just indexed more slowly than other content? Does Google just ignore metadata to avoid keyword stuffing by Search Engine Optimization tricksters? Are embedded SVG documents ignored by Google altogether? Whatever the reason, the technical barriers to a system like this working today are very low and dropping rapidly.
Chemical Reviews on Wikipedia

Until 1966, Chemical Abstracts Service used volunteers exclusively to abstract the chemical literature. At the system's peak, thousands of scientists were willing and even enthusiastic to perform this tedious, demanding work for very little pay. The system was eventually phased out in favor of the professional abstracting service that replaced it.
What motivated these volunteer abstracters? Enlightened self-interest probably played a role. After all, preparing a set of abstracts in a field you do research in can pay off in your own increased productivity. It's also a good way to stay current with the literature, something you would do anyway. If your abstracts help your fellow scientist at the same time, so much the better. Another motivation could have been a simple desire to create order out of chaos, not unlike the many social networking activities flourishing on the internet today. Christoph Steinbeck will be giving a talk at the Fall 2006 ACS touching on this theme, and it's likely others will too as the field gathers momentum.
In browsing Dylan Stiles' blog, I came across an entry on the aldehyde->alkyne homologation. In it, Stiles cited a brief, but informative Wikipedia review on this reaction.
Surely this couldn't be the only example of online volunteer-created reviews in chemistry on Wikipedia. A quick search resulted in numerous examples:
- Wittig Reaction
- Grignard Reaction
- Sharpless Dihydroxylation
- Diels-Alder Reaction
- Thermite Reaction
- Danishefsky Taxol Total Synthesis
- Olefin Metathesis
- McMurry Coupling
- Robinson Annulation
- Swern Oxidation
- Cholesterol
The proliferation of this kind of volunteer, peer-reviewed chemical documentation is similar in spirit to that used by CAS in earlier times, although the technology couldn't be more different. Of course, this approach is not without its limitations and potential pitfalls, but it is remarkably self-correcting. This emerging system offers something that CAS will never be able to provide - involvement in, and ownership of, the documentation process itself.
Unfortunately, chemical informatics technologies have not kept up with internet technologies and the people currently using them. The reliance on raster images of 2-D structures, and the lack of a reliable web-enabled chemical indexing system both loom especially large as future problems to be addressed. What tools does this new kind of chemical publishing need to become more effective and efficient? How can these tools be made as invisible as possible?

