<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag mechanize</title>
    <link>http://depth-first.com/articles/tag/mechanize</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Hacking ChemSpider: Query by SMILES and InChI with Ruby</title>
      <description>&lt;p&gt;&lt;a href="http://chemspider.com"&gt;&lt;img src="http://depth-first.com/demo/20070917/chemspider.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Slowly but surely, cheminformatics Web APIs are starting to appear. What's the big deal, you may ask? By exposing Web APIs, service providers enable third parties to develop new applications that &lt;a href="http://depth-first.com/articles/2006/09/23/mashups-for-fun-and-profit"&gt;"mash up"&lt;/a&gt; functionality from two or more sites, or which take the original service in directions its founders never considered.&lt;/p&gt;

&lt;p&gt;By way of &lt;a href="http://www.chemspider.com/blog"&gt;Antony Williams' blog&lt;/a&gt;, I came across &lt;a href="http://www.chemspider.com/blog/?p=135"&gt;the announcement&lt;/a&gt; for the &lt;a href="http://www.chemspider.com/inchi.asmx"&gt;ChemSpider Web API&lt;/a&gt;. What can this API do for Web developers? To find out, let's write a small Ruby library.&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;Our library will accept a SMILES string or InChI identifier and returns a URL pointing to the corresponding ChemSpider compound summary page. Like &lt;a href="http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby"&gt;previous Web API demos&lt;/a&gt;, this one uses the powerful Ruby library &lt;a href="http://mechanize.rubyforge.org/"&gt;Mechanize&lt;/a&gt;, leading to very concise code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mechanize&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;ChemSpider&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;url_for_inchi&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.chemspider.com/inchi.asmx/InChIToCSID?inchi=&lt;span class="expr"&gt;#{inchi}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="ident"&gt;csid&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;Hpricot&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;string&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;innerHTML&lt;/span&gt;

    &lt;span class="ident"&gt;csid&lt;/span&gt; &lt;span class="punct"&gt;==&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;?&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt; &lt;span class="punct"&gt;:&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.chemspider.com/RecordView.aspx?id=&lt;span class="expr"&gt;#{csid}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;url_for_smiles&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.chemspider.com/inchi.asmx/SMILESToInChI?smiles=&lt;span class="expr"&gt;#{smiles}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;Hpricot&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;string&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;innerHTML&lt;/span&gt;

    &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Invalid SMILES: &lt;span class="expr"&gt;#{smiles}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;==&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

    &lt;span class="ident"&gt;url_for_inchi&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;url_for_inchi&lt;/tt&gt; method directly uses the ChemSpider API to query by InChI. The &lt;tt&gt;url_for_smiles&lt;/tt&gt; method first uses the ChemSpider API to convert a SMILES string to an InChI identifier, and then calls the &lt;tt&gt;url_for_inchi&lt;/tt&gt; method.&lt;/p&gt;

&lt;p&gt;Two points are worth noting. First, although for convenience the InChI identifier isn't &lt;a href="http://www.aptana.com/docs/index.php/URL_Escape_Codes"&gt;escaped&lt;/a&gt; before being appended to the API URL, strictly speaking it should be. Second, both methods invoke the underlying Mechanize library &lt;a href="http://code.whytheluckystiff.net/hpricot/"&gt;Hpricot&lt;/a&gt; to parse the raw XML returned by ChemSpider.&lt;/p&gt;

&lt;h4&gt;Testing&lt;/h4&gt;

&lt;p&gt;Saving the above code to a file called &lt;strong&gt;chemspider.rb&lt;/strong&gt;, we can get the URL to ChemSpider's benzene page from its InChI identifier via interactive Ruby (irb):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'chemspider'
=&gt; true
irb(main):002:0&gt; include ChemSpider
=&gt; Object
irb(main):003:0&gt; url_for_inchi "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"
=&gt; "http://www.chemspider.com/RecordView.aspx?id=236"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We can work with SMILES strings just as easily as with InChIs:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'chemspider'
=&gt; true
irb(main):002:0&gt; include ChemSpider
=&gt; Object
irb(main):003:0&gt; url_for_smiles 'c1ccccc1'
=&gt; "http://www.chemspider.com/RecordView.aspx?id=236"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Both the InChI and the SMILES string yield a URL pointing to the &lt;a href="http://www.chemspider.com/RecordView.aspx?id=236"&gt;same Chemspider page for benzene&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Like most &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;chemical databases&lt;/a&gt;, ChemSpider uses a compound summary page as a way of organizing the available resources for a given molecule. With a method in hand for accessing these pages based on arbitrary SMILES or InChIs, we can begin to think of manipulating ChemSpider independently of its current user interface. But that's a story for another time.&lt;/p&gt;</description>
      <pubDate>Mon, 17 Sep 2007 08:19:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:9e6b90f7-590d-47d4-b2a8-bbac5a014c74</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/17/hacking-chemspider-query-by-smiles-and-inchi-with-ruby</link>
      <category>Tools</category>
      <category>chemspider</category>
      <category>hackingchemspider</category>
      <category>ruby</category>
      <category>webapi</category>
      <category>mashup</category>
      <category>mechanize</category>
      <category>hpricot</category>
    </item>
    <item>
      <title>Easily Convert Publisher URLs and DOIs to Bibliographical Citations: Synthesis, Synlett, Ruby, and Mechanize</title>
      <description>&lt;p&gt;&lt;a href="http://www.thieme-chemistry.com/thieme-chemistry/journals/info/synthesis/index.shtml"&gt;&lt;img src="http://depth-first.com/demo/20070627/synthesis.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://www.thieme-chemistry.com/thieme-chemistry/journals/info/synlett/index.shtml"&gt;&lt;img src="http://depth-first.com/demo/20070627/synlett.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;Just ten years ago, the thought of accessing all of the world's scientific literature online struck many as optimistic at best. Today, an increasing number of scientists use the Web as their &lt;em&gt;only&lt;/em&gt; means of reading the literature.&lt;/p&gt;

&lt;p&gt;This shift has brought with it a significant, but rarely discussed problem: converting a publisher URL or DOI to a bibliographical citation (title, authors, journal, page, volume, etc.). This is a problem because bookmarking and linking URLs are the way we reference Web documents, but the bibliographical citation is still how we reference paper documents. We may well see the day when the need for bibliographical citations disappears, but until that happens there's a need for user-friendly tools that manage the conversion.&lt;/p&gt;

&lt;p&gt;This article discusses remarkably simple and flexible solution to this problem using &lt;a href="http://www.ruby-lang.org/en/"&gt;Ruby&lt;/a&gt; and the outstanding &lt;a href="http://mechanize.rubyforge.org"&gt;Mechanize&lt;/a&gt; library. As test subjects, I'll use two of my favorite journals: &lt;a href="http://www.thieme-chemistry.com/thieme-chemistry/journals/info/synthesis/index.shtml"&gt;Synthesis&lt;/a&gt; and &lt;a href="http://www.thieme-chemistry.com/thieme-chemistry/journals/info/synlett/index.shtml"&gt;Synlett&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;What is Mechanize?&lt;/h4&gt;

&lt;p&gt;From the &lt;a href="http://mechanize.rubyforge.org/mechanize/"&gt;Mechanize documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Think of Mechanize as a programmable Web browser controlled by Ruby. This powerful idea offers possibilities that go far beyond the relatively simple example I'll describe here.&lt;/p&gt;

&lt;h4&gt;A Simple Library&lt;/h4&gt;

&lt;p&gt;Our library consists of the following code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mechanize&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;Thieme&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;get_ris&lt;/span&gt; &lt;span class="ident"&gt;url&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt;  &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="ident"&gt;url&lt;/span&gt;
    &lt;span class="ident"&gt;ris_link&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;links&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;text&lt;/span&gt; &lt;span class="punct"&gt;/[&lt;/span&gt;&lt;span class="constant"&gt;Bb&lt;/span&gt;&lt;span class="punct"&gt;]&lt;/span&gt;&lt;span class="ident"&gt;iblio&lt;/span&gt;&lt;span class="punct"&gt;/&lt;/span&gt;
    &lt;span class="ident"&gt;ris_url&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;uri&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;host&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;ris_link&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;href&lt;/span&gt;

    &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get_file&lt;/span&gt; &lt;span class="ident"&gt;ris_url&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

After saving this code in a file called &lt;strong&gt;thieme.rb&lt;/strong&gt;, we can test it on &lt;a href="http://dx.doi.org/10.1055/s-2007-966071"&gt;this &lt;i&gt;Synthesis&lt;/i&gt; article&lt;/a&gt; with interactive ruby (irb):

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'thieme'
=&gt; true
irb(main):002:0&gt; include Thieme
=&gt; Object
irb(main):003:0&gt; ris=get_ris 'http://www.thieme-connect.com/ejournals/abstract/synthesis/doi/10.1055/s-2007-966071'
=&gt; "\nTY  - JOUR\nID  - 101055S2007966071\nAU  - Gil,Mar\355a Victoria\nAU  - Ar\351valo,Mar\355a Jos\351\nAU  - L\363pez,\323scar\nT1  - Click Chemistry - What?s in a Name? Triazole Synthesis and Beyond\nJO  - Synthesis\nPY  - 2007///\nIS  - 11\nSP  - 1589\nEP  - 1620\nER  - \n\n"
irb(main):004:0&gt; ris.match /T1  - (.*)/
=&gt; #&lt;MatchData:0xb77c9784&gt;
irb(main):005:0&gt; title = $1
=&gt; "Click Chemistry - What?s in a Name? Triazole Synthesis and Beyond"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Let's say that instead of a deep link to an article in the Thieme site we have a DOI. Can we still get the bibliographical citation?&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
irb(main):006:0&gt; ris=get_ris 'http://dx.doi.org/10.1055/s-2007-966071'
=&gt; "\nTY  - JOUR\nID  - 101055S2007966071\nAU  - Gil,Mar\355a Victoria\nAU  - Ar\351valo,Mar\355a Jos\351\nAU  - L\363pez,\323scar\nT1  - Click Chemistry - What?s in a Name? Triazole Synthesis and Beyond\nJO  - Synthesis\nPY  - 2007///\nIS  - 11\nSP  - 1589\nEP  - 1620\nER  - \n\n"
irb(main):007:0&gt; ris.match /T1  - (.*)/
=&gt; #&lt;MatchData:0xb77c6264&gt;
irb(main):008:0&gt; title = $1
=&gt; "Click Chemistry - What?s in a Name? Triazole Synthesis and Beyond"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;It worked! Mechanize had no problem following the redirect from dx.doi.org. Similar results would be obtained with a &lt;em&gt;Synlett&lt;/em&gt; article or DOI.&lt;/p&gt;

&lt;p&gt;For this approach to be truly useful, our software would need to gracefully handle character encoding to avoid garbled strings such as "What?s".&lt;/p&gt;

&lt;h4&gt;How it Works&lt;/h4&gt;

&lt;p&gt;Our library relies on two important things being provided by the publisher: (1) a downloadable version of the &lt;a href="http://www.adeptscience.co.uk/kb/article/A626"&gt;RIS&lt;/a&gt; file for every article; and (2) a consistent way to access it across journals. By simply telling Mechanize to follow a link labeled as "Download bibliographical data", we can easily retrieve the full citation. Fortunately, nearly every scientific publisher follows this practice.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Just a few lines of Ruby code have solved a significant scientific information management problem, at least for one journal. A complete solution to the problem would require code for every scientific journal, a task well underway at &lt;a href="http://depth-first.com/articles/2007/06/22/hacking-citeulike-metascripting-with-ruby-and-session"&gt;CiteULike&lt;/a&gt;. While nothing here can pretend to be an end-user application, it's not difficult to imagine how to build one (or a few) using these basic concepts. But that's a story for another time.&lt;/p&gt;</description>
      <pubDate>Wed, 27 Jun 2007 08:45:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:06fa9a4a-ced8-4505-96e9-2c3dca00a912</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/27/easily-convert-publisher-urls-and-dois-to-bibliographical-citations-synthesis-synlett-ruby-and-mechanize</link>
      <category>Tools</category>
      <category>synthesis</category>
      <category>synlett</category>
      <category>ruby</category>
      <category>mechanize</category>
      <category>citeulike</category>
      <category>doi</category>
    </item>
  </channel>
</rss>
