<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag connotea</title>
    <link>http://depth-first.com/articles/tag/connotea</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Hacking CiteULike: Metascripting with Ruby and Session</title>
      <description>&lt;p&gt;&lt;a href="http://citeulike.org"&gt;&lt;img src="http://depth-first.com/demo/20070622/cul.gif" border="0" align="right"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://www.citeulike.org/"&gt;CiteULike&lt;/a&gt; lets users easily manage their bibliographies of scholarly works, and in the process discover other users' papers on related subjects. One of the most powerful features of CiteULike is its ability to convert arbitrary URLs into fully-formatted bibliographical citations. CiteULike manages to do this while largely avoiding the &lt;a href="http://depth-first.com/articles/2007/06/15/buggotea-the-problem-with-abundance"&gt;Buggotea Problem&lt;/a&gt; in which multiple URLs pointing to the same work are saved. Wouldn't it be useful if this aspect of CiteULike could be independently scripted, tested, and re-integrated? This article describes how to do this using the powerful scripting language Ruby.&lt;/p&gt;

&lt;h4&gt;A Simple Test&lt;/h4&gt;

&lt;p&gt;The core of CiteULike's bibliography lookup system is contained in its &lt;em&gt;Filters&lt;/em&gt;. Filters accept a URL they're interested in and return a bibliographical citation. Each filter generally works with a specific publisher's URLs and may be written in just about any scripting language.&lt;/p&gt;

&lt;p&gt;CiteULike has released nearly all of its filters and the driver as an Open Source package distributed under a BSD-style license. 
Complete documentation on using and writing filters is available &lt;a href="http://svn.citeulike.org/svn/plugins/HOWTO.txt"&gt;here&lt;/a&gt;, and the package can be obtained through subversion:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ svn co http://svn.citeulike.org/svn/ citeulike
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;After changing into the &lt;strong&gt;citeulike/drivers&lt;/strong&gt; directory, you'll see a file called &lt;strong&gt;driver.tcl&lt;/strong&gt;. This script coordinates the activities of the various filters contained under their respective language subdirectories. Let's say you want to parse the following URL:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;http://pubs.acs.org/cgi-bin/abstract.cgi/jmcmar/2007/50/i05/abs/jm0611509.html&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

The command to do so would be:

&lt;div class="console"&gt;
&lt;pre&gt;
./driver.tcl parse http://pubs.acs.org/cgi-bin/abstract.cgi/jcisd8/2006/46/i03/abs/ci050400b.html
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;If you get an error starting with:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
couldn't execute "./acs.py": no such file or directory
    while executing
"open "|./[file tail $exe]" "r+""
    (procedure "parse_url" line 31)
    invoked from within

&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;then the problem lies with the shebang line of the &lt;strong&gt;drivers/python/acs.py&lt;/strong&gt; script. For example, on my system I need to change the shebang to:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;#!/usr/bin/python2.5&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

Making this change and re-running the driver script gives the output I was expecting:

&lt;div class="console"&gt;
&lt;pre&gt;
parsing http://pubs.acs.org/cgi-bin/abstract.cgi/jcisd8/2006/46/i03/abs/ci050400b.html

serial -&gt; 1549-9596
volume -&gt; 46
linkouts -&gt; {DOI {} 10.1021/ci050400b {} {}}
year -&gt; 2006
type -&gt; JOUR
start_page -&gt; 991
url -&gt; http://pubs3.acs.org/acs/journals/doilookup?in_doi=10.1021/ci050400b
end_page -&gt; 998
plugin_version -&gt; 1
doi -&gt; 10.1021/ci050400b
day -&gt; 22
issue -&gt; 3
title -&gt; The Blue Obelisk-Interoperability in Chemical Informatics
journal -&gt; J. Chem. Inf. Model.
abstract -&gt; Abstract: The Blue Obelisk Movement (http://www.blueobelisk.org/) is the name used by a diverse Internet group promoting reusable chemistry via open source software development, consistent and complimentary chemoinformatics research, open data, and open standards. We outline recent examples of cooperation in the Blue Obelisk group: a shared dictionary of algorithms and implementations in chemoinformatics algorithms drawing from our various software projects; a shared repository of chemoinformatics data including elemental properties, atomic radii, isotopes, atom typing rules, and so forth; and Web services for the platform-independent use of chemoinformatics programs.
status -&gt; ok
month -&gt; 5
authors -&gt; {Guha {} R {Guha, R.}} {Howard {} MT {Howard, M.T.}} {Hutchison {} GR {Hutchison, G.R.}} {Murray-Rust {} P {Murray-Rust, P.}} {Rzepa {} H {Rzepa, H.}} {Steinbeck {} C {Steinbeck, C.}} {Wegner {} J {Wegner, J.}} {Willighagen {} EL {Willighagen, E.L.}}
address -&gt; Pennsylvania State University, University Park, Pennsylvania 16804-3000, Jmol Project, U. S. A., Cornell University, Ithaca, New York 14853, Cambridge University, Cambridge CB2 1TN, Great Britain, Imperial College, London SW7 2AZ, Great Britain, Cologne University Bioinformatics Center (CUBIC), Z&#252;lpicher Str. 47, D-50674 K&#246;ln, Germany, University of T&#252;bingen, T&#252;bingen, Germany, and Jmol project, The Netherlands
plugin -&gt; acs
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Metascripting with Ruby and Session&lt;/h4&gt;

&lt;p&gt;The CiteULike driver is written in &lt;a href="http://tcl.sourceforge.net/"&gt;Tcl&lt;/a&gt;, a language I've been interested in and heard about, but which I just don't have the time to try to learn. Wouldn't it be great if we could direct the activities of the CiteULike driver from the comfort and power of Ruby?&lt;/p&gt;

&lt;p&gt;It turns out that a handy little Ruby library exists which is perfect for the metascripting we'll need to do - &lt;a href="http://raa.ruby-lang.org/project/session/"&gt;Session&lt;/a&gt;. The Session library can be installed with:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
# gem install session
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Once installed, we can fire up interactive ruby (irb), and tell driver.tcl what to do:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'rubygems'
=&gt; true
irb(main):002:0&gt; require 'session'
=&gt; true
irb(main):003:0&gt; url = 'http://pubs.acs.org/cgi-bin/abstract.cgi/jcisd8/2006/46/i03/abs/ci050400b.html'
=&gt; "http://pubs.acs.org/cgi-bin/abstract.cgi/jcisd8/2006/46/i03/abs/ci050400b.html"
irb(main):004:0&gt; session = Session.new
=&gt; #&lt;Session::Sh:0xb7c03174 @stdout=#&lt;IO:0xb7c02ee0&gt;, @threads=[], @history=nil, @stdin=#&lt;IO:0xb7c02f30&gt;, @use_open3=nil, @opts={}, @errproc=nil, @use_spawn=nil, @debug=nil, @stderr=#&lt;IO:0xb7c02e7c&gt;, @outproc=nil, @track_history=nil, @prog="sh"&gt;
irb(main):005:0&gt; result=session.execute "./driver.tcl parse #{url}"
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Reprocessing the Bibliography&lt;/h4&gt;

&lt;p&gt;The last command of our interactive ruby session returns an Array called "result", the first element of which is our article's bibliographical information. We can extract its title with the following commands:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
irb(main):011:0&gt; result[0].match /title -&gt; (.*)/
=&gt; #&lt;MatchData:0xb7b94828&gt;
irb(main):012:0&gt; $1
=&gt; "The Blue Obelisk-Interoperability in Chemical Informatics"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Using a series of similar regular expressions, we can re-construct the full bibliographical citation for the paper.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;The availability of the CiteULike filters and driver opens up many possibilities to build collaborative bibliographical management applications. By using some simple metascripting techniques, this can be done in any scripting language. Our little example here is but a glimpse of what might be possible.&lt;/p&gt;</description>
      <pubDate>Fri, 22 Jun 2007 10:08:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:48edf401-0c12-4ebe-b517-f5eafd0ae203</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/22/hacking-citeulike-metascripting-with-ruby-and-session</link>
      <category>Tools</category>
      <category>citeulike</category>
      <category>connotea</category>
      <category>buggotea</category>
      <category>ruby</category>
      <category>metascripting</category>
      <category>acs</category>
      <category>session</category>
    </item>
    <item>
      <title>Buggotea: The Problem with Abundance</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/gottcha78/545271203/"&gt;&lt;img src="http://depth-first.com/demo/20070615/bug.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;Although &lt;a href="http://depth-first.com/articles/2007/03/22/why-i-still-dont-use-connotea"&gt;I still don't use&lt;/a&gt; it yet, &lt;a href="http://connotea.org"&gt;Connotea&lt;/a&gt; is a very useful service for many scientists. Combining aspects of social networking and bibliography management, Connotea offers a glimpse at some of the vast potential for Web 2.0 in the sciences. But the service is not without its thorny technical problems, one of which is discussed in this article.&lt;/p&gt;

&lt;p&gt;For those unfamiliar with the service, Connotea lets you organize and share hyperlinks. This, in itself, is nothing remarkable. Many services such as &lt;a href="http://digg.com"&gt;Digg&lt;/a&gt;, &lt;a href="http://del.icio.us/"&gt;del.icio.us&lt;/a&gt;, and &lt;a href="http://reddit.com/"&gt;Reddit&lt;/a&gt; offer similar capability.&lt;/p&gt;

&lt;p&gt;What's unique about Connotea is its emphasis on bookmarking scientific and scholarly content. By taking advantage of the &lt;a href="http://www.crossref.org/"&gt;CrossRef&lt;/a&gt; service built on top of the &lt;a href="http://doi.org/"&gt;DOI&lt;/a&gt; system, Connotea makes creating a bibliographical reference to a paper as easy as entering a short alphanumeric sequence found on the document itself.&lt;/p&gt;

&lt;p&gt;As long as all Connotea users work with DOIs, there is no problem. The DOI organization ensures that every document with a DOI can be accessed via a single, immutable URL. For example, if a paper has a DOI of "10.1021/ol015948s", then the document can be accessed through &lt;a href="http://dx.doi.org/10.1021/ol015948s"&gt;this link&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But what happens if a Connotea user either doesn't know about DOI or for some reason prefers not to use it? Instead, they'd rather work with a &lt;a href="http://pubs.acs.org/cgi-bin/abstract.cgi/orlef7/2001/3/i11/abs/ol015948s.html"&gt;publisher's URL&lt;/a&gt; directly. This is not as unlikely as it may seem at first. For example, Connotea fails to recognize the title of many ACS papers when they are entered as DOIs, but does recognize them as direct abstract links.&lt;/p&gt;

&lt;p&gt;PubMed offers still more ways to refer to the same document. To name a few:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&amp;amp;db=pubmed&amp;amp;dopt=Abstract&amp;amp;list_uids=11405701"&gt;Abstract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&amp;amp;db=pubmed&amp;amp;dopt=AbstractPlus&amp;amp;list_uids=11405701"&gt;Abstract Plus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&amp;amp;db=pubmed&amp;amp;dopt=Citation&amp;amp;list_uids=11405701"&gt;Citation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without really trying, we've found no fewer than five different URLs that all refer to the same scientific work. If you look under &lt;a href="http://www.connotea.org/user/rapodaca"&gt;my user profile&lt;/a&gt;, you'll see that Connotea is happy to add all of these references as separate entities. This means that each will receive its own set of tags and its own summary page. If my collection of links grows to a few hundred, I may not realize that I actually have two or three links to the same paper in my collection. And other Connotea users may fail to see my papers because they're using a URL that differs from mine.&lt;/p&gt;

&lt;p&gt;After researching this problem a bit, I found that although it doesn't seem to have an immediate solution, at least it has a name: &lt;a href="http://www.nodalpoint.org/2006/12/15/buggotea_redundant_links_in_connotea"&gt;Buggotea&lt;/a&gt;. It bears a remarkable similarity to the "unique" SMILES problem, which was a major motivation for the development of &lt;a href="http://www.iupac.org/inchi/"&gt;InChI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It wasn't long ago that the ability to access the scientific literature online seemed far-fetched. Today, the Internet as become the only scientific publication medium that matters. This has created a variety of new problems - and &lt;a href="http://depth-first.com/articles/2007/02/14/whats-broken-in-cheminformatics"&gt;opportunities&lt;/a&gt; to solve them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Image Credit: &lt;a href="http://flickr.com/photos/gottcha78/"&gt;gottcha78&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;</description>
      <pubDate>Fri, 15 Jun 2007 09:09:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:9947c46e-f47e-4bac-92f0-b6503cc6252a</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/15/buggotea-the-problem-with-abundance</link>
      <category>Meta</category>
      <category>connotea</category>
      <category>buggotea</category>
      <category>doi</category>
      <category>url</category>
    </item>
    <item>
      <title>Why I Still Don't Use Connotea</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070322/folders.jpg" align="right" border="0"&gt;&lt;/img&gt;Like most scientists, I have a collection of hardcopy journal articles. After they sit on my desk for awhile, I sort them into folders. Each folder has a label such as "dihydroxylation", "olefin metathesis", or "InChI". This system is nothing more than a small &lt;a href="http://www.shirky.com/writings/ontology_overrated.html"&gt;ontology&lt;/a&gt;. It does the job of building a top-level index of my papers, but it's not nearly as effective as it could be.&lt;/p&gt;

&lt;p&gt;There are many problems with ontology. For example, the world changes; I decide to add just one aminohydroxylation paper to the "dihydroxylation" folder and before I know it there are five others in there. Most papers require multiple categories; should I file that metathesis paper under "ring closing", "ruthenium", or "nobel"?&lt;/p&gt;

&lt;p&gt;Some time ago, Nature Publishing Group launched &lt;a href="http://www.connotea.org/"&gt;Connotea&lt;/a&gt;, a service designed to do for scientific papers what &lt;a href="http://del.icio.us/"&gt;del.icio.us&lt;/a&gt; does for hyperlinks. &lt;a href="http://www.citeulike.org/"&gt;CiteULike&lt;/a&gt; is a similar service. Both services abandon heirarchical classification in favor of tags - short text descriptions that can be applied to one or more articles. The possibilities of harnessing the &lt;a href="http://depth-first.com/articles/2007/01/18/collective-intelligence-and-the-dumbness-of-crowds"&gt;collective intelligence&lt;/a&gt; of your fellow scientists through these services are tantalizing. And the ability to finally do away with hardcopy journal articles seems liberating.&lt;/p&gt;

&lt;p&gt;I think both Connotea and CiteULike are great services, but I still continue to use my horrible system of physical papers and physical folders. And I know I'm not alone. Maybe the thought of transcribing my massive collection of paper into a system like Connotea gives me just the excuse I need to avoid doing it. Maybe I just like being able to browse the articles in these folders while looking for new ideas. Increasingly, I've been turning directly to services like &lt;a href="http://www.cas.org/SCIFINDER/"&gt;SciFinder&lt;/a&gt; and Google to track down a paper, even if I know it's in my collection. So maybe my collection of hardcopy articles just isn't as useful as it once was.&lt;/p&gt;

&lt;p&gt;Successful information systems demonstrate a concrete payoff that is much higher than the price of admission. As anyone who uses Linux or Mac OS X can tell you, technical superiority alone is not enough to make people switch. Although Connotea could no doubt make the management of my personal collection of articles easier, the price is simply too high to justify the effort.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/jrparis/"&gt;Jean Ruaud&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Thu, 22 Mar 2007 12:17:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:9fc58ab0-bda8-46a8-816f-c2255e4bd095</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/22/why-i-still-dont-use-connotea</link>
      <category>Meta</category>
      <category>connotea</category>
      <category>citeulike</category>
      <category>literature</category>
    </item>
  </channel>
</rss>
