Streamlining Cheminformatics on the Web: Let InChI Do the Heavy Lifting and Get Some REST 11

Posted by Rich Apodaca Mon, 01 Oct 2007 14:53:00 GMT

A recent Depth-First article discussed the advantages of minimal Web APIs in Cheminformatics. Recently, Antony Williams unveiled some simplified ChemSpider URL schemes, mainly from the perspective of enabling Google indexing. However, it's possible to take this scheme much, much further. Here I present a proposal for radically simplifying (and unifying) the development of cheminformatics Web APIs and the software that interacts with them.

The New ChemSpider URLs

ChemSpider now has several new kinds of URLs. For the purposes of this article, the most interesting of these are of the format:

These URLs may seem unremarkable, but there's much more than meets the eye. They let anonymous developers query ChemSpider about specific substances - without needing to know much at all about how ChemSpider itself works. Goodbye API. Goodbye API support. Goodbye API documentation. Goodbye angle brackets. Hello to getting stuff done. It's all very RESTful. Well, at least it could be that way with some minor modification.

Some Recommendations

ChemSpider hasn't quite reached that place where the API just disappears. The problem is that the ChemSpider URLs listed above point to query results pages, not compound summary pages. Were these URLs to redirect to a summary page, we could construct the following URLs to extract ChemSpider resources (I've replaced the '=' sign with a '/' for simplicity):

  • .../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ Get all resources for the molecule identified by the given InChIKey - i.e., "Compound summary page"

  • .../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/molfile.mol Get the molfile for the molecule identified by the given InChIKey

  • .../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/small_image.png Get the small image for the molecule indentified by the given InChIKey.

  • .../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/large_image.png Get the large image for the molecule identified by the given InChIKey.

  • .../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/citations.xml Get the list of citations for the molecule identified by the given InchIKey, in XML format.

Jane, a developer building Web applications on top of this new ChemSpider API, would immediately notice that things just work. Let's say her online database stores IC50s at the dopamine D2 receptor. On the summary page for each molecule, she wants to link out to the ChemSpider compound summary page, if available. She would simply construct the InChIKey on her server, build the needed ChemSpider URL and GET it. An HTTP 404 would indicate no molecule with that Key exists on ChemSpider and so no link would be shown. An HTTP 200 would indicate ChemSpider has the molecule, and so the link would appear.

Conclusions

It would be interesting enough if ChemSpider adopted a system like that described here. But the real power of this approach would emerge if multiple Web services were to adopt it. By following a simple set of conventions, these services would enable third party developers to elegantly mashup all manner of cheminformatics resources into applications unimaginable today.

Technically, there's nothing that prevents this system from being implemented on every free chemistry database in existence today. However, doing so would transfer a significant degree of control from service operators to third-party developers. Not all providers will be comfortable with that idea.

Cheminformatics Web service providers need to carefully consider whether they're trying to develop a platform or an integrated service. As history has shown, the strategies, and upside potential, for each approach can differ dramatically.

Comments

Leave a response

  1. JamesM Mon, 01 Oct 2007 19:49:11 GMT

    The O'Reilly RESTful book is still somewhere towards the bottom of my unread pile, so perhaps I'm missing something obvious, but how does Jane know that the small image is at

    .../InChIKey/XYZZY/small_image.png

    and not

    .../InChIKey/XYZZY/images/small.png

    for example?

  2. Rich Apodaca Mon, 01 Oct 2007 21:00:31 GMT

    James,

    Good point. Jane doesn't know, unless that were a convention all cheminformatics servers followed. I actually like your version better.

    But what Jane gets out of it is not having to perform a separate query to find a molecular identifier, parse the result, and then locate the resources.

  3. ChemSpiderMan Wed, 03 Oct 2007 02:27:44 GMT

    Rich...some more useful URL exposure at http://www.chemspider.com/blog/?p=188

  4. Rich Apodaca Wed, 03 Oct 2007 04:51:57 GMT

    Antony,

    Wow, that's fast turnaround! :-)

  5. Egon Willighagen Wed, 03 Oct 2007 18:08:55 GMT

    Such URLs are not new; this kind of set up has been around for at least since 1995, when most websites were using this kind of set up. The URL might have been a bit more complicated. Websites with chemical content treasured the content, so these URLs were never really much announced; I often actually had to work them out using HTML form requests, sometimes even using a HTML proxy that dumped the full HTML GET/POST request as plain text.

    I believe it really is the open nature of ChemSpider that allows these URLs to so open too. Cheers for that!

  6. Rich Apodaca Wed, 03 Oct 2007 18:58:00 GMT

    Egon,

    True enough about RESTful URLs not being a new idea.

    The new part is that we can now unambiguously encode molecular structures into these RESTful URLs. It's one of those obvious, but very useful things that isn't yet being done.

    I'll go out on a limb and suggest that ChemSpider may have been the first to do it on a public, production system.

    I also agree that this only works on systems designed to be open. The approach only makes sense if you agree that all traffic to your site, human or robot, is good.

  7. Jim Downing Thu, 04 Oct 2007 10:33:51 GMT

    JamesM, Rich,

    A more 'connected' RESTful way Jane can discover the small.png by having a link with pre-arranged rel and/or title attributes to the image.

  8. Rich Apodaca Thu, 04 Oct 2007 14:20:59 GMT

    Jim,

    Interesting. Could you give an example of that approach?

  9. Jim Downing Thu, 04 Oct 2007 16:02:44 GMT

    Best example I can find atm is del.icio.us which uses the class attribute to do the same thing. If you look at the source for some del.icio.us bookmarks (e.g. http://del.icio.us/ojd20) you'll see a ul with class 'posts', each entry in which has some links that a machine could follow to manipulate the entry: the edit links have class 'edit' and the delete links 'rm'. A microformat of sorts, I suppose, and easily GRDDLable into RDF if that's your poison.

    So in our case you could have a list of links to various images of a compound and a controlled vocabulary of image types that get used as class values.

  10. Rich Apodaca Fri, 05 Oct 2007 20:26:07 GMT

    Jim,

    Thx for the example - I had a look at it.

    But doesn't this approach require its own convention as well? The convention can be encoded in the document or encoded in the URL - either way it's a convention that needs to be interpreted and followed.

    Unless I'm missing something...

  11. Jim Downing Mon, 08 Oct 2007 13:16:26 GMT

    Yes, it is a convention, but one that follows the principle of connectedness. As a first order advantage, this means that even someone who didn't understand the convention could at least be guaranteed to be able to get to it.

    The main advantage of conventions applied by links is that they're much easier to combine, and much easier to integrate into existing apps / sites since they don't presuppose control over the URI space. It's also more convenient to extend.

    As a side advantage you could also use GRDDL on decorated links to generate RDF.

Comments