The Daily Molecule: The Wonders of Chemistry - One Molecule at a Time 4
Chemistry is a big field judged by any standard, including the proliferation of American Chemical Society (ACS) divisions. Each subdiscipline in chemistry is in turn so big, that once a chemist becomes 'differentiated' it's easy to lose touch even with neighboring subdisciplines. It doesn't have to be that way. This article introduces a new service, The Daily Molecule designed to make it just a little bit easier (and hopefully fun) to stay in the chemical loop.
What Is It?
The idea is simple: every weekday, a new molecule will be featured on The Daily Molecule with a short write-up and some leading references. Although molecules in the news will get first priority, any molecule is fair game.
The material for The Daily Molecule will be drawn from Chempedia, which in turn gets some of its content from Wikipedia. In other words, the entries on the Daily Molecule will be largeley written by my fellow chemists.
The process of creating a Daily Molecule entry is not time-consuming, but much of what is being done manually now could be automated in the future. The technology platform lends itself well to many forms of chemistry-specific modification (see below).
I hesitate to use the term 'blog' to describe The Daily Molecule, but the description may be helpful to an extent.
The Daily Molecule is unlike a blog in that most content will be generated by others, selected by some criteria, reformatted for consistency, and published. In that sense, The Daily Molecule is a something like a mini scientific journal, but it turns the process of acquiring content on its head.
If chemistry ever evolves beyond the current model of publication, which seems inevitable at this point, the journals of the future may resemble The Daily Molecule in one or more ways.
Technology
The software running The Daily Molecule is a modified version of SimpleLog, a Web application based on Ruby on Rails. Unlike most blogging engines, SimpleLog focuses on implementing only the most basic publication features, and doing them to perfection. If you know a little Ruby and can work with Rails, you can do a lot with SimpleLog.
One of the first items of business will be to implement reCAPTCHA support and activate comments on articles.
Some ideas for chemically-enabling The Daily Molecule include a graphical abstract sidebar and (sub)structure search. Currently, the 2D chemical structure images posted to The Daily Molecule have complete connection tables embedded as metadata, a feature with some interesting possibilities.
The Molecule of the Day/Week/Month
The basic idea behind The Daily Molecule is not new. Many other services have sprung up over the last ten years that operate, at least on the surface, similarly. Some examples:
- Molecule of the Day
- ACS Molecule of the Week
- Drugs and Poisons
- Saturday Night Synthesis
- The Molecule of the Month (may be the oldest continuously-operated MOTM site in existence)
- 3dchem.com Molecule of the Month
- Protein Spotlight
- PDB Molecule of the Month
- Prous Molecule of the Month
Quite a few others don't appear on this list.
The different idea behind the The Daily Molecule is that chemical content already exists in on the Web in machine-readable format with licenses that permit its re-use; all that's needed is a way to aggregate, format, and package that information in a form suitable for once-daily scanning and cheminformatics manipulation.
Conclusions
Like no other medium, the Web blurs artificial distinctions: between work and play; between private and public; between on-topic and off-topic; between fame and obscurity; between mine and yours; between big and small; and between profit and non-profit. Chemistry may be late to the party, but is not immune to its call.
Streamlining Cheminformatics on the Web: Let InChI Do the Heavy Lifting and Get Some REST 11
A recent Depth-First article discussed the advantages of minimal Web APIs in Cheminformatics. Recently, Antony Williams unveiled some simplified ChemSpider URL schemes, mainly from the perspective of enabling Google indexing. However, it's possible to take this scheme much, much further. Here I present a proposal for radically simplifying (and unifying) the development of cheminformatics Web APIs and the software that interacts with them.
The New ChemSpider URLs
ChemSpider now has several new kinds of URLs. For the purposes of this article, the most interesting of these are of the format:
These URLs may seem unremarkable, but there's much more than meets the eye. They let anonymous developers query ChemSpider about specific substances - without needing to know much at all about how ChemSpider itself works. Goodbye API. Goodbye API support. Goodbye API documentation. Goodbye angle brackets. Hello to getting stuff done. It's all very RESTful. Well, at least it could be that way with some minor modification.
Some Recommendations
ChemSpider hasn't quite reached that place where the API just disappears. The problem is that the ChemSpider URLs listed above point to query results pages, not compound summary pages. Were these URLs to redirect to a summary page, we could construct the following URLs to extract ChemSpider resources (I've replaced the '=' sign with a '/' for simplicity):
.../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ Get all resources for the molecule identified by the given InChIKey - i.e., "Compound summary page"
.../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/molfile.mol Get the molfile for the molecule identified by the given InChIKey
.../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/small_image.png Get the small image for the molecule indentified by the given InChIKey.
.../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/large_image.png Get the large image for the molecule identified by the given InChIKey.
.../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/citations.xml Get the list of citations for the molecule identified by the given InchIKey, in XML format.
Jane, a developer building Web applications on top of this new ChemSpider API, would immediately notice that things just work. Let's say her online database stores IC50s at the dopamine D2 receptor. On the summary page for each molecule, she wants to link out to the ChemSpider compound summary page, if available. She would simply construct the InChIKey on her server, build the needed ChemSpider URL and GET it. An HTTP 404 would indicate no molecule with that Key exists on ChemSpider and so no link would be shown. An HTTP 200 would indicate ChemSpider has the molecule, and so the link would appear.
Conclusions
It would be interesting enough if ChemSpider adopted a system like that described here. But the real power of this approach would emerge if multiple Web services were to adopt it. By following a simple set of conventions, these services would enable third party developers to elegantly mashup all manner of cheminformatics resources into applications unimaginable today.
Technically, there's nothing that prevents this system from being implemented on every free chemistry database in existence today. However, doing so would transfer a significant degree of control from service operators to third-party developers. Not all providers will be comfortable with that idea.
Cheminformatics Web service providers need to carefully consider whether they're trying to develop a platform or an integrated service. As history has shown, the strategies, and upside potential, for each approach can differ dramatically.
Free Chemistry Databases on the Web: Creating a Comprehensive Guide 20
One of Depth-First's more popular articles is a summary of free databases titled Thirty-Two Free Chemistry Databases. Clearly there is a need to link the producers of free chemical databases (developers) with the potential users of these services (chemists). Chemistry is slowly emerging from a decades-long period of over-reliance on a single supplier of information. As new players enter, they'll need some way to have their message heard.
The Problem
As evidence of this need, I'm getting more requests to list additional services on the Thirty-Two Databases article - or to provide an updated review of a service already there. This is wonderful!
One approach would be for me to simply research and write an updated article reviewing the new additions myself. The problem is that thirty-two is already a very large number to deal with. My guess is that there must now be well over sixty or seventy free chemistry databases. That's far too many for one person to research properly on their own.
On the other hand, the Web is all about collaboration, so why no try to use it that way?
An Idea
Here's the idea: if you run a free database or other online chemistry service and would like to promote it, post a comment to this article containing a link and brief description of what makes your service different/useful. If you've used a free chemistry database, feel free to provide your thoughts on it. If there's a free database you wish existed but doesn't yet, feel free to write about that. Unlike the other articles on this site for which comments are closed after two weeks, this article's comments will remain open indefinitely.
After some period of time, I'll use these comments to write a new article highlighting the new material.
Notice the use of the word "free". A free database can be used by any member of the general public without fees or a lengthy registration process. This includes both free speech and free beer services. There are more restrictive definitions that could be applied, but let's not worry about those just yet. Free beer is better than no beer at all.
Links can either be in HTML or Markdown. Here's one example of each:
<a href="http://megamolecules.com">MegaMolecules</a> (HTML)
[MegaMolecules](http://megamolecules.com) (Markdown)The Outcome
I have no idea what kind of response this experiment will generate. But if past experience is any guide, large numbers of chemists are keenly interested in free chemistry databases. All they need is a link.
Image Credit: Kate and Dave Hugh
Update: Four Free 2-D Structure Editors for Web Applications 1
A previous article discussing the deployment of four free 2D structure editors has been fixed. The sample pages demonstrating how to obtain a molfile from each have also been restored.
A 2D Chemical Structure Editor for the Web: Embracing Constraints in Firefly
A previous article outlined the major concepts behind Firefly, a new 2D structure editor for the Web. Structure editors are unique in that they require solutions to problems in so many different areas: graphical user interface design; chemical structure aesthetics; 2D graphics and geometry; molecular representation; and graph theory, to name a few.
These are important considerations, but they pale in comparison to the biggest design challenge: Firefly must be deployed and run in a heterogeneous network environment. This one constraint frames all of the basic design questions involving platform, programming language, footprint, and mode of deployment. And this is a Good Thing.
Platform, Programming Language, and Deployment
Considering the diverse array of hardware and operating systems on the Internet, it's amazing that anything works at all. Although standards save the day in most cases, there is no all-purpose solution when it comes to interactive Web technologies. Today, there are essentially three options for Firefly:
Flash To my knowledge, no 2D structure editor has ever been implemented in Flash. (This alone sounds like a good reason to try it sometime.) As a platform, Flash has a lot going for it, including a large installed base, and the abstraction layer OpenLaszlo. On the down side, Flash support on platforms such as Linux has not been what it could be. As Flash continues to mature, this problem may become less important.
Ajax Essentially nothing more than HTML and JavaScript - and no plug-ins of any kind. At least two editors have been written in Ajax: WebME and PubChem's editor. Ajax works in a lot of situations, but in my opinion it still isn't up to the job of creating a responsive, interactive structure editor.
Java Applet A massive installed base, support for just about every kind of hardware in use, a robust object-oriented language with a powerful 2D graphics and GUI libraries, and it's even Open Source. Although Flash and Ajax have their advantages, Java is the best option for the foreseeable future. There are several Java structure editors to choose from, and for good reason. (Future articles will discuss what makes Firefly different.)
What about a Java application? Unfortunately, not even a user's ability to install software on their own machines can be taken for granted in the context in which Firefly will be running. For example, many companies enforce prohibitions against employee-installation of non-approved software. Machines found in libraries or other public places may face similar limitations. Given these real possibilties, an applet is the only option that makes sense.
As to the version of Java that Firefly is built on, JDK 1.4.2 seems to have become the unwritten standard. It's mature enough the have most of what Java has today, but is old enough to be on most machines. Even with this compromise, Firefly will have an impressive array of functionality to work with: Java2D, Swing, and many performance optimizations relative to earlier Java versions.
Footprint
High-speed Internet access is ubiquitous. Bandwidth is now a commodity. Memory is practically free and processors - don't get me started on processors these days. But developers who take massive bandwidth and staggering processor power for granted are setting themselves and their users up for many unpleasant interactions. Firefly must be lightweight, responsive, and fast to deploy under any network load and on 80% of the hardware on the Web. Limiting its footprint to 150K will increase the chances of success.
With such a small footprint, none of the currently-available open source Java cheminformatics libraries (e.g. CDK, JOELib, Octet) will be of much use. They are simply too big. Even the smallest of them (Octet) weighs in at about 300K. Of course, it's possible to strip out unnecessary features from these libraries. But even then, a great deal of unused functionality would remain. This is the unavoidable downside of general-purpose libraries. By developing a small cheminformatics library specifically designed to power a 2D structure editor, Firefly should be able to meet the 150K target.
Is 150K a realistic goal? My previous experience in developing Structure and Structure-CDK taught me that with Java2D, very lightweight structure-rendering code is well within reach. For example, the Structure-CDK jarfile is an unoptimized 39K. (But even Structure-CDK faces the same problem as other general-purpose libraries - it's much bigger than it needs to be. In other words, some Structure-CDK concepts may be applicable to Firefly, but the code itself must be completely re-written.)
When combined with the fact that a lot of cheminformatics functionality, such as structure clean-up, can be offloaded to a server, 150K begins to look like a very reasonable target.
As a final encouragement, consider that everything Java Molecular Editor does, including SMILES canonicalization, is contained in a jarfile no larger than 40K.
Wrap-Up
As you can see, even before a single line of code was written, Firefly's design was constrained in some very important ways. Identifying and embracing these constraints doesn't ensure success, but it greatly increases its chances. In articles to follow, I'll show how Firefly was designed to thrive within these constraints.
Image Credit: frogmuseum2

