<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag doi</title>
    <link>http://depth-first.com/articles/tag/doi</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Easily Convert Publisher URLs and DOIs to Bibliographical Citations: Synthesis, Synlett, Ruby, and Mechanize</title>
      <description>&lt;p&gt;&lt;a href="http://www.thieme-chemistry.com/thieme-chemistry/journals/info/synthesis/index.shtml"&gt;&lt;img src="http://depth-first.com/demo/20070627/synthesis.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://www.thieme-chemistry.com/thieme-chemistry/journals/info/synlett/index.shtml"&gt;&lt;img src="http://depth-first.com/demo/20070627/synlett.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;Just ten years ago, the thought of accessing all of the world's scientific literature online struck many as optimistic at best. Today, an increasing number of scientists use the Web as their &lt;em&gt;only&lt;/em&gt; means of reading the literature.&lt;/p&gt;

&lt;p&gt;This shift has brought with it a significant, but rarely discussed problem: converting a publisher URL or DOI to a bibliographical citation (title, authors, journal, page, volume, etc.). This is a problem because bookmarking and linking URLs are the way we reference Web documents, but the bibliographical citation is still how we reference paper documents. We may well see the day when the need for bibliographical citations disappears, but until that happens there's a need for user-friendly tools that manage the conversion.&lt;/p&gt;

&lt;p&gt;This article discusses remarkably simple and flexible solution to this problem using &lt;a href="http://www.ruby-lang.org/en/"&gt;Ruby&lt;/a&gt; and the outstanding &lt;a href="http://mechanize.rubyforge.org"&gt;Mechanize&lt;/a&gt; library. As test subjects, I'll use two of my favorite journals: &lt;a href="http://www.thieme-chemistry.com/thieme-chemistry/journals/info/synthesis/index.shtml"&gt;Synthesis&lt;/a&gt; and &lt;a href="http://www.thieme-chemistry.com/thieme-chemistry/journals/info/synlett/index.shtml"&gt;Synlett&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;What is Mechanize?&lt;/h4&gt;

&lt;p&gt;From the &lt;a href="http://mechanize.rubyforge.org/mechanize/"&gt;Mechanize documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Think of Mechanize as a programmable Web browser controlled by Ruby. This powerful idea offers possibilities that go far beyond the relatively simple example I'll describe here.&lt;/p&gt;

&lt;h4&gt;A Simple Library&lt;/h4&gt;

&lt;p&gt;Our library consists of the following code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mechanize&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;Thieme&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;get_ris&lt;/span&gt; &lt;span class="ident"&gt;url&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt;  &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="ident"&gt;url&lt;/span&gt;
    &lt;span class="ident"&gt;ris_link&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;links&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;text&lt;/span&gt; &lt;span class="punct"&gt;/[&lt;/span&gt;&lt;span class="constant"&gt;Bb&lt;/span&gt;&lt;span class="punct"&gt;]&lt;/span&gt;&lt;span class="ident"&gt;iblio&lt;/span&gt;&lt;span class="punct"&gt;/&lt;/span&gt;
    &lt;span class="ident"&gt;ris_url&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;uri&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;host&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;ris_link&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;href&lt;/span&gt;

    &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get_file&lt;/span&gt; &lt;span class="ident"&gt;ris_url&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

After saving this code in a file called &lt;strong&gt;thieme.rb&lt;/strong&gt;, we can test it on &lt;a href="http://dx.doi.org/10.1055/s-2007-966071"&gt;this &lt;i&gt;Synthesis&lt;/i&gt; article&lt;/a&gt; with interactive ruby (irb):

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'thieme'
=&gt; true
irb(main):002:0&gt; include Thieme
=&gt; Object
irb(main):003:0&gt; ris=get_ris 'http://www.thieme-connect.com/ejournals/abstract/synthesis/doi/10.1055/s-2007-966071'
=&gt; "\nTY  - JOUR\nID  - 101055S2007966071\nAU  - Gil,Mar\355a Victoria\nAU  - Ar\351valo,Mar\355a Jos\351\nAU  - L\363pez,\323scar\nT1  - Click Chemistry - What?s in a Name? Triazole Synthesis and Beyond\nJO  - Synthesis\nPY  - 2007///\nIS  - 11\nSP  - 1589\nEP  - 1620\nER  - \n\n"
irb(main):004:0&gt; ris.match /T1  - (.*)/
=&gt; #&lt;MatchData:0xb77c9784&gt;
irb(main):005:0&gt; title = $1
=&gt; "Click Chemistry - What?s in a Name? Triazole Synthesis and Beyond"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Let's say that instead of a deep link to an article in the Thieme site we have a DOI. Can we still get the bibliographical citation?&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
irb(main):006:0&gt; ris=get_ris 'http://dx.doi.org/10.1055/s-2007-966071'
=&gt; "\nTY  - JOUR\nID  - 101055S2007966071\nAU  - Gil,Mar\355a Victoria\nAU  - Ar\351valo,Mar\355a Jos\351\nAU  - L\363pez,\323scar\nT1  - Click Chemistry - What?s in a Name? Triazole Synthesis and Beyond\nJO  - Synthesis\nPY  - 2007///\nIS  - 11\nSP  - 1589\nEP  - 1620\nER  - \n\n"
irb(main):007:0&gt; ris.match /T1  - (.*)/
=&gt; #&lt;MatchData:0xb77c6264&gt;
irb(main):008:0&gt; title = $1
=&gt; "Click Chemistry - What?s in a Name? Triazole Synthesis and Beyond"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;It worked! Mechanize had no problem following the redirect from dx.doi.org. Similar results would be obtained with a &lt;em&gt;Synlett&lt;/em&gt; article or DOI.&lt;/p&gt;

&lt;p&gt;For this approach to be truly useful, our software would need to gracefully handle character encoding to avoid garbled strings such as "What?s".&lt;/p&gt;

&lt;h4&gt;How it Works&lt;/h4&gt;

&lt;p&gt;Our library relies on two important things being provided by the publisher: (1) a downloadable version of the &lt;a href="http://www.adeptscience.co.uk/kb/article/A626"&gt;RIS&lt;/a&gt; file for every article; and (2) a consistent way to access it across journals. By simply telling Mechanize to follow a link labeled as "Download bibliographical data", we can easily retrieve the full citation. Fortunately, nearly every scientific publisher follows this practice.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Just a few lines of Ruby code have solved a significant scientific information management problem, at least for one journal. A complete solution to the problem would require code for every scientific journal, a task well underway at &lt;a href="http://depth-first.com/articles/2007/06/22/hacking-citeulike-metascripting-with-ruby-and-session"&gt;CiteULike&lt;/a&gt;. While nothing here can pretend to be an end-user application, it's not difficult to imagine how to build one (or a few) using these basic concepts. But that's a story for another time.&lt;/p&gt;</description>
      <pubDate>Wed, 27 Jun 2007 08:45:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:06fa9a4a-ced8-4505-96e9-2c3dca00a912</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/27/easily-convert-publisher-urls-and-dois-to-bibliographical-citations-synthesis-synlett-ruby-and-mechanize</link>
      <category>Tools</category>
      <category>synthesis</category>
      <category>synlett</category>
      <category>ruby</category>
      <category>mechanize</category>
      <category>citeulike</category>
      <category>doi</category>
    </item>
    <item>
      <title>Buggotea: The Problem with Abundance</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/gottcha78/545271203/"&gt;&lt;img src="http://depth-first.com/demo/20070615/bug.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;Although &lt;a href="http://depth-first.com/articles/2007/03/22/why-i-still-dont-use-connotea"&gt;I still don't use&lt;/a&gt; it yet, &lt;a href="http://connotea.org"&gt;Connotea&lt;/a&gt; is a very useful service for many scientists. Combining aspects of social networking and bibliography management, Connotea offers a glimpse at some of the vast potential for Web 2.0 in the sciences. But the service is not without its thorny technical problems, one of which is discussed in this article.&lt;/p&gt;

&lt;p&gt;For those unfamiliar with the service, Connotea lets you organize and share hyperlinks. This, in itself, is nothing remarkable. Many services such as &lt;a href="http://digg.com"&gt;Digg&lt;/a&gt;, &lt;a href="http://del.icio.us/"&gt;del.icio.us&lt;/a&gt;, and &lt;a href="http://reddit.com/"&gt;Reddit&lt;/a&gt; offer similar capability.&lt;/p&gt;

&lt;p&gt;What's unique about Connotea is its emphasis on bookmarking scientific and scholarly content. By taking advantage of the &lt;a href="http://www.crossref.org/"&gt;CrossRef&lt;/a&gt; service built on top of the &lt;a href="http://doi.org/"&gt;DOI&lt;/a&gt; system, Connotea makes creating a bibliographical reference to a paper as easy as entering a short alphanumeric sequence found on the document itself.&lt;/p&gt;

&lt;p&gt;As long as all Connotea users work with DOIs, there is no problem. The DOI organization ensures that every document with a DOI can be accessed via a single, immutable URL. For example, if a paper has a DOI of "10.1021/ol015948s", then the document can be accessed through &lt;a href="http://dx.doi.org/10.1021/ol015948s"&gt;this link&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But what happens if a Connotea user either doesn't know about DOI or for some reason prefers not to use it? Instead, they'd rather work with a &lt;a href="http://pubs.acs.org/cgi-bin/abstract.cgi/orlef7/2001/3/i11/abs/ol015948s.html"&gt;publisher's URL&lt;/a&gt; directly. This is not as unlikely as it may seem at first. For example, Connotea fails to recognize the title of many ACS papers when they are entered as DOIs, but does recognize them as direct abstract links.&lt;/p&gt;

&lt;p&gt;PubMed offers still more ways to refer to the same document. To name a few:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&amp;amp;db=pubmed&amp;amp;dopt=Abstract&amp;amp;list_uids=11405701"&gt;Abstract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&amp;amp;db=pubmed&amp;amp;dopt=AbstractPlus&amp;amp;list_uids=11405701"&gt;Abstract Plus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&amp;amp;db=pubmed&amp;amp;dopt=Citation&amp;amp;list_uids=11405701"&gt;Citation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without really trying, we've found no fewer than five different URLs that all refer to the same scientific work. If you look under &lt;a href="http://www.connotea.org/user/rapodaca"&gt;my user profile&lt;/a&gt;, you'll see that Connotea is happy to add all of these references as separate entities. This means that each will receive its own set of tags and its own summary page. If my collection of links grows to a few hundred, I may not realize that I actually have two or three links to the same paper in my collection. And other Connotea users may fail to see my papers because they're using a URL that differs from mine.&lt;/p&gt;

&lt;p&gt;After researching this problem a bit, I found that although it doesn't seem to have an immediate solution, at least it has a name: &lt;a href="http://www.nodalpoint.org/2006/12/15/buggotea_redundant_links_in_connotea"&gt;Buggotea&lt;/a&gt;. It bears a remarkable similarity to the "unique" SMILES problem, which was a major motivation for the development of &lt;a href="http://www.iupac.org/inchi/"&gt;InChI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It wasn't long ago that the ability to access the scientific literature online seemed far-fetched. Today, the Internet as become the only scientific publication medium that matters. This has created a variety of new problems - and &lt;a href="http://depth-first.com/articles/2007/02/14/whats-broken-in-cheminformatics"&gt;opportunities&lt;/a&gt; to solve them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Image Credit: &lt;a href="http://flickr.com/photos/gottcha78/"&gt;gottcha78&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;</description>
      <pubDate>Fri, 15 Jun 2007 09:09:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:9947c46e-f47e-4bac-92f0-b6503cc6252a</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/15/buggotea-the-problem-with-abundance</link>
      <category>Meta</category>
      <category>connotea</category>
      <category>buggotea</category>
      <category>doi</category>
      <category>url</category>
    </item>
  </channel>
</rss>
