Hacking DOI: Interconvert Bibliographic References and DOIs with CrossRef and OpenURL 8

Posted by Rich Apodaca Tue, 06 May 2008 19:50:00 GMT

Science is in the middle of a transition from print to the internet as the primary medium of communication. This transition, although a boon for many scientists, creates a host of problems for those dealing with scientific information. For example, how would you interconvert a DOI and its corresponding bibliographic reference?

A previous Depth-First article discussed a screen-scraping method as one solution. Unfortunately, this system relies on an intimate understanding of how individual publishers' Websites work, requires a different implementation for each publisher, and can break at any time without warning.

This article discusses a far more robust solution to the problem of interconverting bibliographic references and DOIs.

Background: OpenURL and CrossRef

CrossRef is the official DOI link registration agency for scholarly and professional publications. One of the less well-known services offered by CrossRef is a free, Web-based bidirectional DOI/bibliographic reference converter based on OpenURL.

A Simple Ruby Library

The following Ruby library is all we need to begin using CrossRef and OpenURL:

require 'rubygems'
require 'hpricot'
require 'open-uri'

module DOI
  # Convert a doi into a bibliographic reference.
  def biblio_for doi
    doc = Hpricot(open("http://www.crossref.org/openurl/?id=doi:#{doi}&noredirect=true&pid=ourl_sample:sample&format=unixref"))

    journal = (doc/"abbrev_title").inner_html
    year = (doc/"journal_issue/publication_date/year").inner_html
    volume = (doc/"journal_issue/journal_volume/volume").inner_html
    number = (doc/"journal_issue/issue").inner_html
    first_page = (doc/"pages/first_page").inner_html
    last_page = (doc/"pages/last_page").inner_html

    "#{journal} #{year}, #{volume}(#{number}) #{first_page}-#{last_page}"
  end

  # Convert a bibliographic reference into a DOI.
  def doi_for journal, year, volume, issue, page
    doc = Hpricot(open("http://www.crossref.org/openurl/?title=#{journal.gsub(/ /, '%20')}&volume=#{volume}&issue=#{issue}&spage=#{page}&date=#{year}&pid=ourl_sample:sample&redirect=false&format=unixref"))

   (doc/"doi").inner_html
  end
end

This code makes use of the excellent Ruby HTML parser library Hpricot.

Testing the Library

Saving the Ruby code to a file named doi.rb, we can test it using the interactive Ruby shell:

$ irb
irb(main):001:0> require 'doi'
=> true
irb(main):002:0> include DOI
=> Object
irb(main):003:0> biblio_for "10.1021/cr00032a009"
=> "Chem. Rev. 1994, 94(8) 2483-2547"
irb(main):004:0> doi_for "Chem. Rev.", 1994, 94, 8, 2483
=> "10.1021/cr00032a009"

Notice how the journal abbreviation Chem. Rev. was used; we'd get the same result if we used Chemical Reviews.

Of course, the implementation described here could be refined a lot. With a DOI, it's trivial to construct a URL to the example paper. But we could take it further than that. With some carefully crafted regular expressions, our doi_for method could accept a freeform bibliographical citation rather than separately identified fragments. From there we might start to think about creating living HTML and/or Wikis from old PDFs and Word documents.

With a little creative thought, other possibilities are well within reach.

Caveat

Before extensively experimenting with CrossRef's OpenURL system, you might want to sign up for a free account. CrossRef is understandably interested in tracking usage and this is their way to do it.

Conclusions

DOIs and traditional bibliographical citations now coexist in a variety of settings, from literature citation managers to journals themselves. Using CrossRef, OpenURL and a little bit of code, it's now possible to make a great deal more sense of it all.

Harvesting bibliographical citations must be one of the least sexy topics in cheminformatics. But as Google demonstrated (building on the approach taken by Science Citation Index), cataloging citation behavior leads to a unique and highly productive way to view many tough problems. Future articles will discuss how this might apply to cheminformatics.

Image Credit: ecstaticist

Cheminformatics Puzzler #2: Planar Chiral Paracyclophanes

Posted by Rich Apodaca Thu, 01 May 2008 13:25:00 GMT

Source: Duan, Ma, Xia, Liu, Ma, and Sun J. Org. Chem.

Without using 3D coordinates, represent the chirality of this class of planar chiral paracyclophanes (hint).

Just a Flesh Wound 3

Posted by Rich Apodaca Wed, 30 Apr 2008 22:24:00 GMT

SEMANTIC KNIGHT: None shall pass without formally defining the ontological meta-semantic thingies of their domain something-or-others!

HACKER: What?

SEMANTIC KNIGHT: None shall pass without using all sorts of semantic meta-meta-meta-stuff that we will invent Real Soon Now!

HACKER: I have no quarrel with you, good Sir Knight, but I must get my work done on the Web. Stand aside!

SEMANTIC KNIGHT: None shall find anything on the Internet without semantic metadata!

HACKER: So be it!

HACKER and SEMANTIC KNIGHT: Aaah!, hiyaah!, etc.

[HACKER chops the SEMANTIC KNIGHT's first argument off by building efficent statistical/heuristic search engines]

HACKER: Now stand aside, worthy adversary.

SEMANTIC KNIGHT: 'Tis but a scratch.

HACKER: A scratch? Your argument has been cut off!

SEMANTIC KNIGHT: No, it isn't.

HACKER: Well, what's that, then?

SEMANTIC KNIGHT: I've had worse. None shall have an effective syndication network without RDF Site Summaries!

[clang]

Hiyaah!

[clang]

Aaaaaaaah!

[HACKER chops the SEMANTIC KNIGHT's second argument off by building the blogs/RSS/Aggregators/Bloglines/etc. network ]

HACKER: Victory is mine!

SEMANTIC KNIGHT: Have at you!

[kick]

HACKER: Eh. You are indeed brave, Sir Knight, but the fight is mine.

SEMANTIC KNIGHT: Oh, had enough, eh?

HACKER: Look, you stupid &^%$# You've got no arguments left.

SEMANTIC KNIGHT: Yes, I have.

HACKER: Look!

SEMANTIC KNIGHT: Just a flesh wound.

[kick]

HACKER: Look, stop that.

SEMANTIC KNIGHT: You won't be able to get machine-machine services without an ontology to formally describe all the relationships!

[kick]

HACKER: Right!

[whop]

[HACKER chops the SEMANTIC KNIGHT's third argument off by building SOAPy and RESTful services with only implicit semantic descriptions]

SEMANTIC KNIGHT: Right. I'll do you for that!

HACKER: You'll what?

SEMANTIC KNIGHT: Come here!

HACKER: What are you going to do, bleed on me?

SEMANTIC KNIGHT: I'm invincible!

HACKER: You're a looney.

SEMANTIC KNIGHT: The SEMANTIC Knight always triumphs! Have at you! Come on, then. I have an battalion of KR theorists on my side!

[whop]

[HACKER chops the SEMANTIC KNIGHT's last argument off with an army of actual code writers]

SEMANTIC KNIGHT: Oh? All right, we'll call it a draw.

HACKER: Come on, folks, let's go.

SEMANTIC KNIGHT: Oh. Oh, I see. Running away, eh? You yellow ^&^%$s! Come back here and take what's coming to you. I'll bite your legs off!

-Michael Champion, xml-dev list

Solve Web Application Scaling Problems With Signed Applets

Posted by Rich Apodaca Fri, 25 Apr 2008 17:12:00 GMT

CampDepict: Building a Simple SMILES Depict Web Application With JRuby, Structure CDK, and Camping

Posted by Rich Apodaca Wed, 23 Apr 2008 15:16:00 GMT

Today's tribute to the power of simplicity comes by way of John Jaeger, who has built one of the simplest cheminformatics Web applications ever written. His creation, CampDepict, interactively produces a raster image of a 2D chemical structure given a SMILES string, not unlike Daylight's Depict application.

CampDepict uses the Ruby Web microframework Camping. From the README:

Camping is a web framework which consistently stays at less than 4kb of code. You can probably view the complete source code on a single page. But, you know, it‘s so small that, if you think about it, what can it really do?

The idea here is to store a complete fledgling web application in a single file like many small CGIs. But to organize it as a Model-View-Controller application like Rails does. You can then easily move it to Rails once you‘ve got it going.

John's application is loosely-based on the Rails Depict application first described in 2006 here on Depth-First. His code makes use of CDK and Structure CDK, and it runs on JRuby.

If you've ever been curious about what Ruby has to offer cheminformatics, CampDepict could be just the application to get your feet wet.

Older posts: 1 2 3 ... 32