Building a Unique Chemistry Journal: Responses to Questions from Nature Chemistry
Neil Withers of the soon-to-be-launched chemistry journal Nature Chemistry has asked for feedback to some questions about the best ways to display chemistry research papers on the Web. Here are some responses:
(1) HTML vs PDF: does anyone read the HTML articles? Do you read the PDF on-screen or print it out?
I've used PDFs both for offline archiving and sharing of especially important articles as well as one-off printing of a paper I'm interested in. I rarely read a paper on-screen if I can avoid it.
Typical workflow: (1) download PDF; (2) print it out; (3); let paper sit while I go do something in the lab that can't wait (or bring it with me); (4) put paper onto a rather large stack of papers just like it; (5) pull paper out of stack from time to time as needed; (6) (optional) file paper in an increasingly chaotic system of folders or recycle it.
This system is bad, and I cursed it weekly during my time as a research chemist. Most of my colleagues had similar experiences.
There are plenty of opportunities to address pain points with the Web. Some ideas:
Make it very easy to find papers on the Nature Chemistry site. If I know a paper is trivial to find, I'm less likely to print it out in the first place. Good search may not be enough (see question 3).
Make the online version as readable as it can be. Minimize fluff like menus, ads and general clutter. Maximize things that promote readability like reasonable column-widths, appropriate fonts, and attractive and readable images.
Add conveniences that make it easier to read the paper online such as hover-popups that display 2D chemical structures for trivial names and IUPAC nomenclature (see below).
Paper is portable but Web documents are alive. Both can be readable - for example, I never print out a blog posting to read it.
(2) Big vs little graphics: what does everyone else think about the tiny size of the graphics in ACS html articles?
Graphics should be sized appropriately. ACS HTML articles are a good example of failing to design the obvious. You'd never read a blog post that looked like those articles, so it's not surprising everyone prints out the PDF.
Another problem is over-wide columns. It's puzzling why journal publishers would ignore all of their hard-won design experience just because a document appears as a Web page. If the ACS used a narrower column width, the Web version would be more readable. For example, check out this article from Beilstein Journal of Organic Chemistry. The only thing I'd change is to make the font larger.
Both problems are correctable using the right software and techniques.
(3) Tagging/’semantic web’: what do you think about the toys on the RSC’s Project Prospect? What kind of things would you like to see tagged/linked to other content in Nature Chemistry? For instance, Steve would love to do something with named reactions.
If by tagging, you mean giving users the ability to tag articles like Flickr allows photos to be tagged, and for other users to make use of those tags while searching, I think it's long overdue and could be a game-changer. It would clearly play to the strength of the Web as a medium.
I must confess that I'm not a fan of the implementation of Project Prospect, although the idea has a lot going for it. There's too much bling and a lot of it fails on my Linux/Firefox 2 system.
The one Prospect feature well worth adapting would be the one that lets you get a 2D structure by clicking on a trivial name or IUPAC name. But there's a much better way to implement it:
Turn it on by default and get rid of the floating right-hand menu.
Make the structure appear, without clicking, by simply hovering the mouse over the trivial name or IUPAC nomenclature. Be sure the delay is set right so that it's not popping up unintentionally.
That's all there is to it. It needn't be complex, just usable.
Another possibility: harvest all of the 2D molecular structures appearing in articles over a given period of time to be displayed in a dense, hyperlinked graphical abstract format ideal for quick browsing.
(4) 3D molecular structures: do these help your understanding of a paper?
Rarely, and in many cases they just add clutter. For almost all small molecules, a properly laid-out and well-drawn 2D chemical structure is more useful. If a central point of discussion in a paper is a 3D structure, then that would be a good use of the technology.
(5) How useful to you are InChIs and SMILES?
Not very. Research chemists rarely care about this kind of technology. They'd much rather have a good-looking 2D chemical structure. InChIs and SMILES, if available, should be hidden away and only brought out when requested. A more basic problem is neither system will be able to encode all of the molecules your journal's authors are likely to discuss.
(6) Forward linking: the RSC and Elsevier/Science Direct offer this – do you use it? Would you use an RSS feed that alerted you to new citations of a particular paper.
It could be useful provided that clutter could be kept to a minimum. It's essentially a form of linkback (see below).
An RSS feed that published linkback activity might be useful, but many of the chemists I know still don't know what RSS is. On the other hand, a page (or email service) that could keep an interested reader updated on linkback activity on all of their papers of interest simultaneously could be very useful.
(7) Would you actually comment on papers if there was a comments box at the end?
Like Egon Willighagen, I'd probably use my blog to do it.
However, most chemists don't maintain blogs or other websites and for them I can see how the ability to post comments would be useful.
Both kinds of users could be accommodated through a combination of comments and linkbacks. Provided that a good spam filtration system were used, this two-pronged approach might be very useful to readers.
Blogs are just the tip of the iceberg, though. Web publication technologies are creating all kinds of opportunities for creating highly focused, constantly evolving, collaborative mini-reviews on special topics. Linkbacks would create value for both readers and authors of these mini-reviews as well as forward-thinking scientific publications that embrace them.
(8) We really like the Biochemical Society’s HTML article style (sample one here) – do you?
No. Frames makes that site very difficult to navigate.
It will be very interesting to see how Nature Publishing Group takes advantage of its opportunity to create something unique among chemistry publications. Asking the kinds of questions they're asking now, and doing so in the way they're doing it, shows they're at least on the right track.
1908 and All That: The Long Tail and Chemistry
Quite a few American Chemical Society (ACS) divisions are celebrating their 100th anniversaries this year. While this fact may at first glance seem like just a piece of nerdy trivia, Rudy Baum, Editor-in-chief of C&E News decided to dig deeper. And what he found was the Long Tail of chemistry, alive and well - in 1908.
In his editorial, Baum describes how he looked for the causes of the sudden appearance of so many ACS divisions in 1908. At its core, he found a growing realization on the part of influential chemists at the time that ACS membership was becoming too diverse in their interests and areas of specialization:
Specialization in subdisciplines of chemistry was also much on ACS members' minds in these years. Some members felt strongly that subdivisions of some sort should be created in the society to provide a venue for chemists from these areas to meet separate from the society as a whole. It was noted that chemists were going off and forming their own specialized organizations in areas like electrochemistry, biological chemistry, and agricultural chemistry.
As early as 1903, ACS established a committee of five distinguished members to look into this issue, with Massachusetts Institute of Technology's Arthur A. Noyes as the chairman. (Throughout its history, ACS has responded to challenges by creating committees!) The committee reported to the ACS Council at its June 1, 1903, meeting, and strongly recommended that "Divisions of the Society be established representing different important branches of chemistry."
For those familiar with the work of Chris Anderson, what's being described is nothing other than the Long Tail:
The theory of the Long Tail is that our culture and economy is increasingly shifting away from a focus on a relatively small number of "hits" (mainstream products and markets) at the head of the demand curve and toward a huge number of niches in the tail. As the costs of production and distribution fall, especially online, there is now less need to lump products and consumers into one-size-fits-all containers. In an era without the constraints of physical shelf space and other bottlenecks of distribution, narrowly-targeted goods and services can be as economically attractive as mainstream fare.
How much money does it cost to set up a new ACS division? Probably not that much. How big is the field of chemistry? Vast. Put the two together, and you have a recipe for today's ACS. A recent Depth-First article described this phenomenon. And C&E News itself maintains a (static?) blog on the Long Tail as it applies to chemical employment.
What does any of this have to do with chemical informatics? Although it may be tempting to think of chemists as a homogeneous group sharing a great deal of experience and knowledge, the proliferation of ACS divisions suggests otherwise. It seems reasonable to think that successful chemical information systems would do well to take this into account in their design and implementation.
Hacking DOI: Interconvert Bibliographic References and DOIs with CrossRef and OpenURL 8
Science is in the middle of a transition from print to the internet as the primary medium of communication. This transition, although a boon for many scientists, creates a host of problems for those dealing with scientific information. For example, how would you interconvert a DOI and its corresponding bibliographic reference?
A previous Depth-First article discussed a screen-scraping method as one solution. Unfortunately, this system relies on an intimate understanding of how individual publishers' Websites work, requires a different implementation for each publisher, and can break at any time without warning.
This article discusses a far more robust solution to the problem of interconverting bibliographic references and DOIs.
Background: OpenURL and CrossRef
CrossRef is the official DOI link registration agency for scholarly and professional publications. One of the less well-known services offered by CrossRef is a free, Web-based bidirectional DOI/bibliographic reference converter based on OpenURL.
A Simple Ruby Library
The following Ruby library is all we need to begin using CrossRef and OpenURL:
require 'rubygems'
require 'hpricot'
require 'open-uri'
module DOI
# Convert a doi into a bibliographic reference.
def biblio_for doi
doc = Hpricot(open("http://www.crossref.org/openurl/?id=doi:#{doi}&noredirect=true&pid=ourl_sample:sample&format=unixref"))
journal = (doc/"abbrev_title").inner_html
year = (doc/"journal_issue/publication_date/year").inner_html
volume = (doc/"journal_issue/journal_volume/volume").inner_html
number = (doc/"journal_issue/issue").inner_html
first_page = (doc/"pages/first_page").inner_html
last_page = (doc/"pages/last_page").inner_html
"#{journal} #{year}, #{volume}(#{number}) #{first_page}-#{last_page}"
end
# Convert a bibliographic reference into a DOI.
def doi_for journal, year, volume, issue, page
doc = Hpricot(open("http://www.crossref.org/openurl/?title=#{journal.gsub(/ /, '%20')}&volume=#{volume}&issue=#{issue}&spage=#{page}&date=#{year}&pid=ourl_sample:sample&redirect=false&format=unixref"))
(doc/"doi").inner_html
end
endThis code makes use of the excellent Ruby HTML parser library Hpricot.
Testing the Library
Saving the Ruby code to a file named doi.rb, we can test it using the interactive Ruby shell:
$ irb irb(main):001:0> require 'doi' => true irb(main):002:0> include DOI => Object irb(main):003:0> biblio_for "10.1021/cr00032a009" => "Chem. Rev. 1994, 94(8) 2483-2547" irb(main):004:0> doi_for "Chem. Rev.", 1994, 94, 8, 2483 => "10.1021/cr00032a009"
Notice how the journal abbreviation Chem. Rev. was used; we'd get the same result if we used Chemical Reviews.
Of course, the implementation described here could be refined a lot. With a DOI, it's trivial to construct a URL to the example paper. But we could take it further than that. With some carefully crafted regular expressions, our doi_for method could accept a freeform bibliographical citation rather than separately identified fragments. From there we might start to think about creating living HTML and/or Wikis from old PDFs and Word documents.
With a little creative thought, other possibilities are well within reach.
Caveat
Before extensively experimenting with CrossRef's OpenURL system, you might want to sign up for a free account. CrossRef is understandably interested in tracking usage and this is their way to do it.
Conclusions
DOIs and traditional bibliographical citations now coexist in a variety of settings, from literature citation managers to journals themselves. Using CrossRef, OpenURL and a little bit of code, it's now possible to make a great deal more sense of it all.
Harvesting bibliographical citations must be one of the least sexy topics in cheminformatics. But as Google demonstrated (building on the approach taken by Science Citation Index), cataloging citation behavior leads to a unique and highly productive way to view many tough problems. Future articles will discuss how this might apply to cheminformatics.
Image Credit: ecstaticist
Cheminformatics Puzzler #2: Planar Chiral Paracyclophanes

Source: Duan, Ma, Xia, Liu, Ma, and Sun J. Org. Chem.
Without using 3D coordinates, represent the chirality of this class of planar chiral paracyclophanes (hint).
Just a Flesh Wound 3
SEMANTIC KNIGHT: None shall pass without formally defining the ontological meta-semantic thingies of their domain something-or-others!
HACKER: What?
SEMANTIC KNIGHT: None shall pass without using all sorts of semantic meta-meta-meta-stuff that we will invent Real Soon Now!
HACKER: I have no quarrel with you, good Sir Knight, but I must get my work done on the Web. Stand aside!
SEMANTIC KNIGHT: None shall find anything on the Internet without semantic metadata!
HACKER: So be it!
HACKER and SEMANTIC KNIGHT: Aaah!, hiyaah!, etc.
[HACKER chops the SEMANTIC KNIGHT's first argument off by building efficent statistical/heuristic search engines]
HACKER: Now stand aside, worthy adversary.
SEMANTIC KNIGHT: 'Tis but a scratch.
HACKER: A scratch? Your argument has been cut off!
SEMANTIC KNIGHT: No, it isn't.
HACKER: Well, what's that, then?
SEMANTIC KNIGHT: I've had worse. None shall have an effective syndication network without RDF Site Summaries!
[clang]
Hiyaah!
[clang]
Aaaaaaaah!
[HACKER chops the SEMANTIC KNIGHT's second argument off by building the blogs/RSS/Aggregators/Bloglines/etc. network ]
HACKER: Victory is mine!
SEMANTIC KNIGHT: Have at you!
[kick]
HACKER: Eh. You are indeed brave, Sir Knight, but the fight is mine.
SEMANTIC KNIGHT: Oh, had enough, eh?
HACKER: Look, you stupid &^%$# You've got no arguments left.
SEMANTIC KNIGHT: Yes, I have.
HACKER: Look!
SEMANTIC KNIGHT: Just a flesh wound.
[kick]
HACKER: Look, stop that.
SEMANTIC KNIGHT: You won't be able to get machine-machine services without an ontology to formally describe all the relationships!
[kick]
HACKER: Right!
[whop]
[HACKER chops the SEMANTIC KNIGHT's third argument off by building SOAPy and RESTful services with only implicit semantic descriptions]
SEMANTIC KNIGHT: Right. I'll do you for that!
HACKER: You'll what?
SEMANTIC KNIGHT: Come here!
HACKER: What are you going to do, bleed on me?
SEMANTIC KNIGHT: I'm invincible!
HACKER: You're a looney.
SEMANTIC KNIGHT: The SEMANTIC Knight always triumphs! Have at you! Come on, then. I have an battalion of KR theorists on my side!
[whop]
[HACKER chops the SEMANTIC KNIGHT's last argument off with an army of actual code writers]
SEMANTIC KNIGHT: Oh? All right, we'll call it a draw.
HACKER: Come on, folks, let's go.
SEMANTIC KNIGHT: Oh. Oh, I see. Running away, eh? You yellow ^&^%$s! Come back here and take what's coming to you. I'll bite your legs off!
-Michael Champion, xml-dev list


