<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag chemspider</title>
    <link>http://depth-first.com/articles/tag/chemspider</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Streamlining Cheminformatics on the Web: Let InChI Do the Heavy Lifting and Get Some REST</title>
      <description>&lt;p&gt;&lt;a href="http://chemspider.com"&gt;&lt;img src="http://depth-first.com/demo/20070917/chemspider.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;A recent Depth-First article discussed the advantages of &lt;a href="http://depth-first.com/articles/2007/08/13/the-best-api-may-be-no-api-at-all-pubchem-and-pdb"&gt;minimal Web APIs in Cheminformatics&lt;/a&gt;. Recently, Antony Williams unveiled some &lt;a href="http://www.chemspider.com/blog/?p=179"&gt;simplified ChemSpider URL schemes&lt;/a&gt;, mainly from the perspective of enabling Google indexing. However, it's possible to take this scheme much, much further. Here I present a proposal for radically simplifying (and unifying) the development of cheminformatics Web APIs and the software that interacts with them.&lt;/p&gt;

&lt;h4&gt;The New ChemSpider URLs&lt;/h4&gt;

&lt;p&gt;ChemSpider now has several new kinds of URLs. For the purposes of this article, the most interesting of these are of the format:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.chemspider.com/InChIKey=DEIYFTQMQPDXOT-RERXVCSDCZ"&gt;http://www.chemspider.com/InChIKey=DEIYFTQMQPDXOT-RERXVCSDCZ &lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.chemspider.com/InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"&gt;http://www.chemspider.com/InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These URLs may seem unremarkable, but there's much more than meets the eye. They let anonymous developers query ChemSpider about specific substances - without needing to know much at all about how ChemSpider itself works. Goodbye API. Goodbye API support. Goodbye API documentation. Goodbye angle brackets. Hello to getting stuff done. It's all very &lt;a href="http://depth-first.com/articles/2007/05/30/restful-cheminformatics"&gt;RESTful&lt;/a&gt;. Well, at least it could be that way with some minor modification.&lt;/p&gt;

&lt;h4&gt;Some Recommendations&lt;/h4&gt;

&lt;p&gt;ChemSpider hasn't quite reached that place where the API &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/downing/?p=128"&gt;just disappears&lt;/a&gt;. The problem is that the ChemSpider URLs listed above point to query results pages, not compound summary pages. Were these URLs to redirect to a summary page, we could construct the following URLs to extract ChemSpider resources (I've replaced the '=' sign with a '/' for simplicity):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;.../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ&lt;/strong&gt; Get all resources for the molecule identified by the given InChIKey - i.e., "Compound summary page"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;.../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/molfile.mol&lt;/strong&gt; Get the molfile for the molecule identified by the given InChIKey&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;.../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/small_image.png&lt;/strong&gt; Get the small image for the molecule indentified by the given InChIKey.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;.../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/large_image.png&lt;/strong&gt; Get the large image for the molecule identified by the given InChIKey.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;.../InChIKey/DEIYFTQMQPDXOT-RERXVCSDCZ/citations.xml&lt;/strong&gt; Get the list of citations for the molecule identified by the given InchIKey, in XML format.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Jane, a developer building Web applications on top of this new ChemSpider API, would immediately notice that things just work. Let's say her online database stores IC&lt;sub&gt;50&lt;/sub&gt;s at the dopamine D&lt;sub&gt;2&lt;/sub&gt; receptor. On the summary page for each molecule, she wants to link out to the ChemSpider compound summary page, if available. She would simply construct the InChIKey on her server, build the needed ChemSpider URL and GET it. An HTTP 404 would indicate no molecule with that Key exists on ChemSpider and so no link would be shown. An HTTP 200 would indicate ChemSpider has the molecule, and so the link would appear.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;It would be interesting enough if ChemSpider adopted a system like that described here. But the real power of this approach would emerge if multiple Web services were to adopt it. By following a simple set of conventions, these services would enable third party developers to elegantly &lt;a href="http://depth-first.com/articles/2006/09/23/mashups-for-fun-and-profit"&gt;mashup&lt;/a&gt; all manner of cheminformatics resources into applications unimaginable today.&lt;/p&gt;

&lt;p&gt;Technically, there's nothing that prevents this system from being implemented on every &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;free chemistry database&lt;/a&gt; in existence today. However, doing so would transfer a significant degree of control from service operators to third-party developers. Not all providers will be comfortable with that idea.&lt;/p&gt;

&lt;p&gt;Cheminformatics Web service providers need to carefully consider whether they're trying to develop a &lt;a href="http://depth-first.com/articles/2007/07/04/pubchem-is-a-platform"&gt;platform or an integrated service&lt;/a&gt;. As history has shown, the strategies, and upside potential, for each approach can differ dramatically.&lt;/p&gt;</description>
      <pubDate>Mon, 01 Oct 2007 10:53:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:e3caeb1b-58a7-4a3a-b215-131825ee9f2e</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/01/streamlining-cheminformatics-on-the-web-let-inchi-do-the-heavy-lifting-and-get-some-rest</link>
      <category>Meta</category>
      <category>chemspider</category>
      <category>rest</category>
      <category>inchi</category>
      <category>inchikey</category>
      <category>web</category>
    </item>
    <item>
      <title>Hacking ChemSpider: Query by SMILES and InChI with Ruby</title>
      <description>&lt;p&gt;&lt;a href="http://chemspider.com"&gt;&lt;img src="http://depth-first.com/demo/20070917/chemspider.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Slowly but surely, cheminformatics Web APIs are starting to appear. What's the big deal, you may ask? By exposing Web APIs, service providers enable third parties to develop new applications that &lt;a href="http://depth-first.com/articles/2006/09/23/mashups-for-fun-and-profit"&gt;"mash up"&lt;/a&gt; functionality from two or more sites, or which take the original service in directions its founders never considered.&lt;/p&gt;

&lt;p&gt;By way of &lt;a href="http://www.chemspider.com/blog"&gt;Antony Williams' blog&lt;/a&gt;, I came across &lt;a href="http://www.chemspider.com/blog/?p=135"&gt;the announcement&lt;/a&gt; for the &lt;a href="http://www.chemspider.com/inchi.asmx"&gt;ChemSpider Web API&lt;/a&gt;. What can this API do for Web developers? To find out, let's write a small Ruby library.&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;Our library will accept a SMILES string or InChI identifier and returns a URL pointing to the corresponding ChemSpider compound summary page. Like &lt;a href="http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby"&gt;previous Web API demos&lt;/a&gt;, this one uses the powerful Ruby library &lt;a href="http://mechanize.rubyforge.org/"&gt;Mechanize&lt;/a&gt;, leading to very concise code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mechanize&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;ChemSpider&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;url_for_inchi&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.chemspider.com/inchi.asmx/InChIToCSID?inchi=&lt;span class="expr"&gt;#{inchi}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="ident"&gt;csid&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;Hpricot&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;string&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;innerHTML&lt;/span&gt;

    &lt;span class="ident"&gt;csid&lt;/span&gt; &lt;span class="punct"&gt;==&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;?&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt; &lt;span class="punct"&gt;:&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.chemspider.com/RecordView.aspx?id=&lt;span class="expr"&gt;#{csid}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;url_for_smiles&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.chemspider.com/inchi.asmx/SMILESToInChI?smiles=&lt;span class="expr"&gt;#{smiles}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;Hpricot&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;string&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;innerHTML&lt;/span&gt;

    &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Invalid SMILES: &lt;span class="expr"&gt;#{smiles}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;==&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

    &lt;span class="ident"&gt;url_for_inchi&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;url_for_inchi&lt;/tt&gt; method directly uses the ChemSpider API to query by InChI. The &lt;tt&gt;url_for_smiles&lt;/tt&gt; method first uses the ChemSpider API to convert a SMILES string to an InChI identifier, and then calls the &lt;tt&gt;url_for_inchi&lt;/tt&gt; method.&lt;/p&gt;

&lt;p&gt;Two points are worth noting. First, although for convenience the InChI identifier isn't &lt;a href="http://www.aptana.com/docs/index.php/URL_Escape_Codes"&gt;escaped&lt;/a&gt; before being appended to the API URL, strictly speaking it should be. Second, both methods invoke the underlying Mechanize library &lt;a href="http://code.whytheluckystiff.net/hpricot/"&gt;Hpricot&lt;/a&gt; to parse the raw XML returned by ChemSpider.&lt;/p&gt;

&lt;h4&gt;Testing&lt;/h4&gt;

&lt;p&gt;Saving the above code to a file called &lt;strong&gt;chemspider.rb&lt;/strong&gt;, we can get the URL to ChemSpider's benzene page from its InChI identifier via interactive Ruby (irb):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'chemspider'
=&gt; true
irb(main):002:0&gt; include ChemSpider
=&gt; Object
irb(main):003:0&gt; url_for_inchi "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"
=&gt; "http://www.chemspider.com/RecordView.aspx?id=236"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We can work with SMILES strings just as easily as with InChIs:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'chemspider'
=&gt; true
irb(main):002:0&gt; include ChemSpider
=&gt; Object
irb(main):003:0&gt; url_for_smiles 'c1ccccc1'
=&gt; "http://www.chemspider.com/RecordView.aspx?id=236"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Both the InChI and the SMILES string yield a URL pointing to the &lt;a href="http://www.chemspider.com/RecordView.aspx?id=236"&gt;same Chemspider page for benzene&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Like most &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;chemical databases&lt;/a&gt;, ChemSpider uses a compound summary page as a way of organizing the available resources for a given molecule. With a method in hand for accessing these pages based on arbitrary SMILES or InChIs, we can begin to think of manipulating ChemSpider independently of its current user interface. But that's a story for another time.&lt;/p&gt;</description>
      <pubDate>Mon, 17 Sep 2007 08:19:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:9e6b90f7-590d-47d4-b2a8-bbac5a014c74</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/17/hacking-chemspider-query-by-smiles-and-inchi-with-ruby</link>
      <category>Tools</category>
      <category>chemspider</category>
      <category>hackingchemspider</category>
      <category>ruby</category>
      <category>webapi</category>
      <category>mashup</category>
      <category>mechanize</category>
      <category>hpricot</category>
    </item>
  </channel>
</rss>
