<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag sciencecitationindex</title>
    <link>http://depth-first.com/articles/tag/sciencecitationindex</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Hacking DOI: Interconvert Bibliographic References and DOIs with CrossRef and OpenURL</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/ecstaticist/1340787730/"&gt;&lt;img src="http://depth-first.com/demo/20080506/web.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Science is in the middle of a transition from print to the internet as the primary medium of communication. This transition, although a boon for many scientists, creates a host of problems for those dealing with scientific information. For example, how would you interconvert a &lt;a href="http://www.doi.org/"&gt;DOI&lt;/a&gt; and its corresponding bibliographic reference?&lt;/p&gt;

&lt;p&gt;A previous Depth-First article discussed &lt;a href="http://depth-first.com/articles/2007/06/27/easily-convert-publisher-urls-and-dois-to-bibliographical-citations-synthesis-synlett-ruby-and-mechanize"&gt;a screen-scraping method&lt;/a&gt; as one solution. Unfortunately, this system relies on an intimate understanding of how individual publishers' Websites work, requires a different implementation for each publisher, and can break at any time without warning.&lt;/p&gt;

&lt;p&gt;This article discusses a far more robust solution to the problem of interconverting bibliographic references and DOIs.&lt;/p&gt;

&lt;h4&gt;Background: OpenURL and CrossRef&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://www.crossref.org/"&gt;CrossRef&lt;/a&gt; is the official &lt;a href="http://www.doi.org/"&gt;DOI&lt;/a&gt; link registration agency for scholarly and professional publications. One of the less well-known services offered by CrossRef is a free, Web-based &lt;a href="http://www.crossref.org/openurl_info.html"&gt;bidirectional DOI/bibliographic reference converter&lt;/a&gt; based on &lt;a href="http://en.wikipedia.org/wiki/OpenURL"&gt;OpenURL&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;A Simple Ruby Library&lt;/h4&gt;

&lt;p&gt;The following Ruby library is all we need to begin using CrossRef and OpenURL:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;hpricot&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;open-uri&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;DOI&lt;/span&gt;
  &lt;span class="comment"&gt;# Convert a doi into a bibliographic reference.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;biblio_for&lt;/span&gt; &lt;span class="ident"&gt;doi&lt;/span&gt;
    &lt;span class="ident"&gt;doc&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Hpricot&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;open&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.crossref.org/openurl/?id=doi:&lt;span class="expr"&gt;#{doi}&lt;/span&gt;&amp;amp;noredirect=true&amp;amp;pid=ourl_sample:sample&amp;amp;format=unixref&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;))&lt;/span&gt;

    &lt;span class="ident"&gt;journal&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;doc&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;abbrev_title&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;inner_html&lt;/span&gt;
    &lt;span class="ident"&gt;year&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;doc&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;journal_issue/publication_date/year&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;inner_html&lt;/span&gt;
    &lt;span class="ident"&gt;volume&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;doc&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;journal_issue/journal_volume/volume&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;inner_html&lt;/span&gt;
    &lt;span class="ident"&gt;number&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;doc&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;journal_issue/issue&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;inner_html&lt;/span&gt;
    &lt;span class="ident"&gt;first_page&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;doc&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;pages/first_page&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;inner_html&lt;/span&gt;
    &lt;span class="ident"&gt;last_page&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;doc&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;pages/last_page&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;inner_html&lt;/span&gt;

    &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{journal}&lt;/span&gt; &lt;span class="expr"&gt;#{year}&lt;/span&gt;, &lt;span class="expr"&gt;#{volume}&lt;/span&gt;(&lt;span class="expr"&gt;#{number}&lt;/span&gt;) &lt;span class="expr"&gt;#{first_page}&lt;/span&gt;-&lt;span class="expr"&gt;#{last_page}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Convert a bibliographic reference into a DOI.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;doi_for&lt;/span&gt; &lt;span class="ident"&gt;journal&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;year&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;volume&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;issue&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;page&lt;/span&gt;
    &lt;span class="ident"&gt;doc&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Hpricot&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;open&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.crossref.org/openurl/?title=&lt;span class="expr"&gt;#{journal.gsub(/ /, '%20')}&lt;/span&gt;&amp;amp;volume=&lt;span class="expr"&gt;#{volume}&lt;/span&gt;&amp;amp;issue=&lt;span class="expr"&gt;#{issue}&lt;/span&gt;&amp;amp;spage=&lt;span class="expr"&gt;#{page}&lt;/span&gt;&amp;amp;date=&lt;span class="expr"&gt;#{year}&lt;/span&gt;&amp;amp;pid=ourl_sample:sample&amp;amp;redirect=false&amp;amp;format=unixref&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;))&lt;/span&gt;

   &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;doc&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;doi&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;inner_html&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This code makes use of the excellent Ruby HTML parser library &lt;a href="http://code.whytheluckystiff.net/hpricot"&gt;Hpricot&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;Saving the Ruby code to a file named &lt;strong&gt;doi.rb&lt;/strong&gt;, we can test it using the interactive Ruby shell:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'doi'
=&gt; true
irb(main):002:0&gt; include DOI
=&gt; Object
irb(main):003:0&gt; biblio_for "10.1021/cr00032a009"
=&gt; "Chem. Rev. 1994, 94(8) 2483-2547"
irb(main):004:0&gt; doi_for "Chem. Rev.", 1994, 94, 8, 2483
=&gt; "10.1021/cr00032a009"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Notice how the journal abbreviation &lt;em&gt;Chem. Rev.&lt;/em&gt; was used; we'd get the same result if we used &lt;em&gt;Chemical Reviews&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Of course, the implementation described here could be refined a lot. With a DOI, it's trivial to &lt;a href="http://dx.doi.org/10.1021/cr00032a009"&gt;construct a URL to the example paper&lt;/a&gt;. But we could take it further than that. With some carefully crafted regular expressions, our &lt;tt&gt;doi_for&lt;/tt&gt; method could accept a freeform bibliographical citation rather than separately identified fragments. From there we might start to think about creating living HTML and/or Wikis from old PDFs and Word documents.&lt;/p&gt;

&lt;p&gt;With a little creative thought, other possibilities are well within reach.&lt;/p&gt;

&lt;h4&gt;Caveat&lt;/h4&gt;

&lt;p&gt;Before extensively experimenting with CrossRef's OpenURL system, you might want to &lt;a href="http://www.crossref.org/requestaccount/"&gt;sign up for a free account&lt;/a&gt;. CrossRef is understandably interested in tracking usage and this is their way to do it.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;DOIs and traditional bibliographical citations now coexist in a variety of settings, from literature citation managers to journals themselves. Using CrossRef, OpenURL and a little bit of code, it's now possible to make a great deal more sense of it all.&lt;/p&gt;

&lt;p&gt;Harvesting bibliographical citations must be one of the least sexy topics in cheminformatics. But as Google demonstrated (building on the approach taken by &lt;a href="http://scientific.thomson.com/products/sci/"&gt;&lt;em&gt;Science Citation Index&lt;/em&gt;&lt;/a&gt;), cataloging citation behavior leads to a unique and highly productive way to view many tough problems. Future articles will discuss how this might apply to cheminformatics.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/ecstaticist/"&gt;ecstaticist&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Tue, 06 May 2008 15:50:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:8eafadd6-bf10-4e65-ac43-2a3bf37de457</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/05/06/hacking-doi-interconvert-bibliographic-references-and-dois-with-crossref-and-openurl</link>
      <category>Tools</category>
      <category>openurl</category>
      <category>crossref</category>
      <category>ruby</category>
      <category>hpricot</category>
      <category>sciencecitationindex</category>
      <category>citations</category>
    </item>
  </channel>
</rss>
