Building Chempedia: The Human Element
The study of chemistry is an inherently social activity. From the papers we use and cite, to the conferences we attend, to the informal discussions we engage in daily, being a chemist means interacting with your fellow chemists. Yet strangely, most chemical information systems either totally ignore this central fact, or provide only the most meager of tools to harness it to its full potential. This article discusses how Chempedia currently integrates the social with the scientific, and what may be in store for the future.
Chempedia as a Tool for Scientific Collaboration
Like all chemical reference works, Chempedia is written by people with their own interests, skills, and ambitions. Unlike almost every other chemical reference work, Chempedia (through Wikipedia, on which it's based) offers intriguing possibilities to directly collaborate and learn from its contributors - or even become one of them.
How can Chempedia better facilitate scientific collaboration?
A Simple But Possibly Useful Feature
Yesterday, a new feature was added to Chempedia that makes it easier to understand the recent history of a Compound Monograph. The new feature shows the date that a Compound Monograph was last edited, and the Wikiepdia user who edited it:

Clicking on the link takes you to the Wikipedia users page, in this case the one for Meodipt. (Wikipedia users frequently use handles rather than their given names.) From Meodipt's page, we can see that s/he received degrees in chemistry and pharmacology and is currently studying law. Meodipt's interests include pharmacology, chemistry, law, and science. We can also see that Meodipt is maintaining a good-sized list of CAS numbers for drugs, grouped by indication.
We might be curious about what Meodipt found worth changing, and how s/he changed it. We could do so by first clicking the Chempedia edit link. In the Wikipedia box (framed by the red dotted lines), we would then click on the 'history' tab. Clicking on the 'last' link for the top entry shows us exactly what Meodipt changed on Pravadoline's compound monograph (also visible through this link).
Looking Ahead
Linking a real person to changes in a Compound Monograph could be enormously useful, if done properly. After all, bringing people with highly focussed interests together is the essence of scientific collaboration. The Chempedia/Wikipedia combination provides one way to do that.
As Chis Anderson puts it, "social networking should be a feature, not a destination." Scientists were social networking long before the Internet, the computer, and the telephone were invented; indeed scientists who fail to connect with their fellow scientists have a difficult time of prospering. When seen from this perspective, it's surprising that good 'social networking' features would not be viewed as a top priority in chemical information systems.
The Chempedia author credit system in its current form is rather simplistic and may not actually promote scientific collaboration at all. But it's not hard to imagine ways to make it far more effective. Future articles will discuss some of the possibilities.
The Daily Molecule: The Wonders of Chemistry - One Molecule at a Time 4
Chemistry is a big field judged by any standard, including the proliferation of American Chemical Society (ACS) divisions. Each subdiscipline in chemistry is in turn so big, that once a chemist becomes 'differentiated' it's easy to lose touch even with neighboring subdisciplines. It doesn't have to be that way. This article introduces a new service, The Daily Molecule designed to make it just a little bit easier (and hopefully fun) to stay in the chemical loop.
What Is It?
The idea is simple: every weekday, a new molecule will be featured on The Daily Molecule with a short write-up and some leading references. Although molecules in the news will get first priority, any molecule is fair game.
The material for The Daily Molecule will be drawn from Chempedia, which in turn gets some of its content from Wikipedia. In other words, the entries on the Daily Molecule will be largeley written by my fellow chemists.
The process of creating a Daily Molecule entry is not time-consuming, but much of what is being done manually now could be automated in the future. The technology platform lends itself well to many forms of chemistry-specific modification (see below).
I hesitate to use the term 'blog' to describe The Daily Molecule, but the description may be helpful to an extent.
The Daily Molecule is unlike a blog in that most content will be generated by others, selected by some criteria, reformatted for consistency, and published. In that sense, The Daily Molecule is a something like a mini scientific journal, but it turns the process of acquiring content on its head.
If chemistry ever evolves beyond the current model of publication, which seems inevitable at this point, the journals of the future may resemble The Daily Molecule in one or more ways.
Technology
The software running The Daily Molecule is a modified version of SimpleLog, a Web application based on Ruby on Rails. Unlike most blogging engines, SimpleLog focuses on implementing only the most basic publication features, and doing them to perfection. If you know a little Ruby and can work with Rails, you can do a lot with SimpleLog.
One of the first items of business will be to implement reCAPTCHA support and activate comments on articles.
Some ideas for chemically-enabling The Daily Molecule include a graphical abstract sidebar and (sub)structure search. Currently, the 2D chemical structure images posted to The Daily Molecule have complete connection tables embedded as metadata, a feature with some interesting possibilities.
The Molecule of the Day/Week/Month
The basic idea behind The Daily Molecule is not new. Many other services have sprung up over the last ten years that operate, at least on the surface, similarly. Some examples:
- Molecule of the Day
- ACS Molecule of the Week
- Drugs and Poisons
- Saturday Night Synthesis
- The Molecule of the Month (may be the oldest continuously-operated MOTM site in existence)
- 3dchem.com Molecule of the Month
- Protein Spotlight
- PDB Molecule of the Month
- Prous Molecule of the Month
Quite a few others don't appear on this list.
The different idea behind the The Daily Molecule is that chemical content already exists in on the Web in machine-readable format with licenses that permit its re-use; all that's needed is a way to aggregate, format, and package that information in a form suitable for once-daily scanning and cheminformatics manipulation.
Conclusions
Like no other medium, the Web blurs artificial distinctions: between work and play; between private and public; between on-topic and off-topic; between fame and obscurity; between mine and yours; between big and small; and between profit and non-profit. Chemistry may be late to the party, but is not immune to its call.
Building Chempedia: Start Simple, Then Iterate 3
As a medium for building software, the Web offers unparalleled adaptability. With nothing to download or install, users of Web applications automatically see the newest version - always. This may sound like a small thing, and technically it is. But it dramatically increases the effectiveness with which software can be created. The previous article in this series introduced Chempedia, the free Chemical encyclopedia and cheminformatics Web application. This article will discuss the process by which Chempedia will become a better service over time.
Iterative Web Application Development
Chempedia, like all actively-developed software, is a work in progress. It will be built in stages starting with the addition of new features, followed by a round of user feedback, bug fixing, and stabilization. This will then be followed by the next major iteration, and so on.
This iterative design style is ideally suited for Web applications. Because the barrier to pushing out new versions is essentially non-existent, a Web application can evolve at a much more rapid rate than other kinds of software. Indeed, the first version of a Web application need only work well enough to prove a point.
One of the keys to iterative Web development is a technology framework designed to facilitate it. Chempedia is being developed with Ruby on Rails, a tool that enables Web developers to take full advantage of the iterative development style the Web makes possible.
Another key element of iterative Web development is users willing to explore the system and offer criticism. Evolution succeeds only when the environment stresses an ecosystem; the same is true in Web application development.
Chempedia will take full advantage of the evolutionary nature of Web application development. As features are added and (hopefully) use of the service grows, Chempedia will evolve in ways that are impossible to predict today.
What's Wrong With Chempedia?
If you happened to take a look at Chempedia last week (that version is now no longer visible), you probably noticed many, many things that needed improvement. Some concerns were in the areas of:
Navigation. Navigation works best when the right granularity of options is achieved. Chempedia's navigation system grouped both closely-related and dissimilar actions at the same level.
Metaphor. The initial idea behind Chempedia was to see what happened when PubChem's chemical structures were mashed up with Wikiepia articles, using CAS numbers as the common link. The site design reflected this, with no clear organizing principle other than mashup. However, after the initial demonstration of the success of this approach, it became clear that Chempedia was strikingly similar in both form and function to the Merck Index. Perhaps this should be used as a clue in deriving a better organizing principle.
Wikipedia integration. The old Chempedia site didn't make it nearly as convenient as is should be to create or edit compound monographs. Because Chempedia serves as a chemically-aware front-end for Wikipedia, the easier it is to get to Wikipedia from Chempedia, the better.
What Changed?
During the process of trying to fix Chempedia's problems, it became clear that a major redesign was in order. This consisted of:
Creating a landing page oriented toward search. Using the Merck Index as a metaphor suggested that Chempedia's landing page should be designed around search, not browsing - as it was originally designed.
Emphasizing compound monographs, not compounds. Chempedia's central organizing principle is now the Compound Monograph. One way this is seen is in the new URL structure, which makes it very easy to see where a Chempedia link is about to take you. For example, consider the URL for benzene. Another way this can be seen is in the inclusion of Compound Monographs lacking a chemical structure.
Designing a streamlined menu system. The main menu system has been broken down into just three main categories: Search; Browse; and Create. These headings refer to actions on Compound Monographs, again in line with their importance as an organizing principle.
Promoting better integration with Wikipedia. After experimenting with a few implementation possibilities, it is now possible to edit Wikipedia articles directly from the Chempedia site, thanks to the use of inline frame. Once again, this capability is tied to the Compound Monograph, from which editing and updating links are accessible.
Striving for comprehensive Wikipedia coverage. Wikipedia had far more compound monographs than could be found on Chempedia, 6,411 of them, to be precise. Chempedia now contains all of them, regardless of whether a chemical structure can be found based on a CAS number in PubChem. This includes inorganics, organometallics, polymers, mixtures, and polypeptides.
Miles to Go Yet
Chempedia is far from being finished. For example, you'll notice many instances in which a Compound Monograph is truncated. This arises from difficulties in parsing Wikipedia's Wikitext format (more on this later).
Ultimately, the full text of each Wikipedia article will be present on Chempedia rather than just the first introductory paragraph. But it will take a significant amount of work to ensure that each article's Wikitext entry can be parsed faithfully.
Chempedia allows search by CAS number, PubChem CID and exact title. Full-text searching is not yet implemented, nor is autocomplete search, both of which would greatly enhance the usability of the service.
Exact structure searching is made possible by the ChemWriter editor in combination with SHA-1 hashed InChIs. Substructure search and query atom search will ultimately be added, but for an encyclopedia containing relatively few molecules, most of which having trivial names, this isn't yet seen as being critical.
You'll notice many Monographs on Chempedia that have no structure information. Behind the scenes, Chempedia uses the 350,000+ CAS numbers now contained in the PubChem database to associate a chemical structure with a Wikipedia article. In the future, these associations will be made by Chempedia and Wikipedia users, which will allow every Chempedia small-molecule Monograph to have a structure associated with it. (It will also create a rather large, publicly-curated, open database of CAS numbers linked to chemical structures, but that's a story for another time).
Your Feedback is Essential
Finally, many of the changes made in this iteration were the result of conversions with chemists and developers. If you see something on Chempedia that just doesn't work for you, please don't be shy about saying so. Feedback is an essential ingredient in making Chempedia the best service it can be.
The Economics of Free: Chris Anderson on Charlie Rose 2
Anderson's comments on the Long Tail and social networks are especially on-target, and relevant to the sciences.
Building a Unique Chemistry Journal: Responses to Questions from Nature Chemistry 3
Neil Withers of the soon-to-be-launched chemistry journal Nature Chemistry has asked for feedback to some questions about the best ways to display chemistry research papers on the Web. Here are some responses:
(1) HTML vs PDF: does anyone read the HTML articles? Do you read the PDF on-screen or print it out?
I've used PDFs both for offline archiving and sharing of especially important articles as well as one-off printing of a paper I'm interested in. I rarely read a paper on-screen if I can avoid it.
Typical workflow: (1) download PDF; (2) print it out; (3); let paper sit while I go do something in the lab that can't wait (or bring it with me); (4) put paper onto a rather large stack of papers just like it; (5) pull paper out of stack from time to time as needed; (6) (optional) file paper in an increasingly chaotic system of folders or recycle it.
This system is bad, and I cursed it weekly during my time as a research chemist. Most of my colleagues had similar experiences.
There are plenty of opportunities to address pain points with the Web. Some ideas:
Make it very easy to find papers on the Nature Chemistry site. If I know a paper is trivial to find, I'm less likely to print it out in the first place. Good search may not be enough (see question 3).
Make the online version as readable as it can be. Minimize fluff like menus, ads and general clutter. Maximize things that promote readability like reasonable column-widths, appropriate fonts, and attractive and readable images.
Add conveniences that make it easier to read the paper online such as hover-popups that display 2D chemical structures for trivial names and IUPAC nomenclature (see below).
Paper is portable but Web documents are alive. Both can be readable - for example, I never print out a blog posting to read it.
(2) Big vs little graphics: what does everyone else think about the tiny size of the graphics in ACS html articles?
Graphics should be sized appropriately. ACS HTML articles are a good example of failing to design the obvious. You'd never read a blog post that looked like those articles, so it's not surprising everyone prints out the PDF.
Another problem is over-wide columns. It's puzzling why journal publishers would ignore all of their hard-won design experience just because a document appears as a Web page. If the ACS used a narrower column width, the Web version would be more readable. For example, check out this article from Beilstein Journal of Organic Chemistry. The only thing I'd change is to make the font larger.
Both problems are correctable using the right software and techniques.
(3) Tagging/’semantic web’: what do you think about the toys on the RSC’s Project Prospect? What kind of things would you like to see tagged/linked to other content in Nature Chemistry? For instance, Steve would love to do something with named reactions.
If by tagging, you mean giving users the ability to tag articles like Flickr allows photos to be tagged, and for other users to make use of those tags while searching, I think it's long overdue and could be a game-changer. It would clearly play to the strength of the Web as a medium.
I must confess that I'm not a fan of the implementation of Project Prospect, although the idea has a lot going for it. There's too much bling and a lot of it fails on my Linux/Firefox 2 system.
The one Prospect feature well worth adapting would be the one that lets you get a 2D structure by clicking on a trivial name or IUPAC name. But there's a much better way to implement it:
Turn it on by default and get rid of the floating right-hand menu.
Make the structure appear, without clicking, by simply hovering the mouse over the trivial name or IUPAC nomenclature. Be sure the delay is set right so that it's not popping up unintentionally.
That's all there is to it. It needn't be complex, just usable.
Another possibility: harvest all of the 2D molecular structures appearing in articles over a given period of time to be displayed in a dense, hyperlinked graphical abstract format ideal for quick browsing.
(4) 3D molecular structures: do these help your understanding of a paper?
Rarely, and in many cases they just add clutter. For almost all small molecules, a properly laid-out and well-drawn 2D chemical structure is more useful. If a central point of discussion in a paper is a 3D structure, then that would be a good use of the technology.
(5) How useful to you are InChIs and SMILES?
Not very. Research chemists rarely care about this kind of technology. They'd much rather have a good-looking 2D chemical structure. InChIs and SMILES, if available, should be hidden away and only brought out when requested. A more basic problem is neither system will be able to encode all of the molecules your journal's authors are likely to discuss.
(6) Forward linking: the RSC and Elsevier/Science Direct offer this – do you use it? Would you use an RSS feed that alerted you to new citations of a particular paper.
It could be useful provided that clutter could be kept to a minimum. It's essentially a form of linkback (see below).
An RSS feed that published linkback activity might be useful, but many of the chemists I know still don't know what RSS is. On the other hand, a page (or email service) that could keep an interested reader updated on linkback activity on all of their papers of interest simultaneously could be very useful.
(7) Would you actually comment on papers if there was a comments box at the end?
Like Egon Willighagen, I'd probably use my blog to do it.
However, most chemists don't maintain blogs or other websites and for them I can see how the ability to post comments would be useful.
Both kinds of users could be accommodated through a combination of comments and linkbacks. Provided that a good spam filtration system were used, this two-pronged approach might be very useful to readers.
Blogs are just the tip of the iceberg, though. Web publication technologies are creating all kinds of opportunities for creating highly focused, constantly evolving, collaborative mini-reviews on special topics. Linkbacks would create value for both readers and authors of these mini-reviews as well as forward-thinking scientific publications that embrace them.
(8) We really like the Biochemical Society’s HTML article style (sample one here) – do you?
No. Frames makes that site very difficult to navigate.
It will be very interesting to see how Nature Publishing Group takes advantage of its opportunity to create something unique among chemistry publications. Asking the kinds of questions they're asking now, and doing so in the way they're doing it, shows they're at least on the right track.
Older posts: 1 2

