Conversations with James

Posted by Rich Apodaca Tue, 31 Jul 2007 07:07:00 GMT

James Gosling, widely credited as the creator of Java, will be at a special session of the San Diego Java Users' Group on August 8th. Love it or loathe it, Java has changed the direction of software development, and Gosling has been a major force behind it.

Based on the record turnout for last month's presentation by Google on the Google Web Toolkit (GWT), seating will be scarce. You'll also need to RSVP beforehand.

Drop me a line if you're interested in meeting there.

Editable and Searchable 2D Molecular Images 2

Posted by Rich Apodaca Mon, 30 Jul 2007 08:01:00 GMT

Word processing replaced the typewriter for the simple reason that documents could be prepared and edited so much more quickly. If Web authoring replaces conventional word processors, it will be for the simple reason that Web documents can be found, distributed, reprocessed, and combined with other content so much more effectively. The peculiar nature of chemical structure information complicates chemistry's transition to Web authoring. This article, the first in a series, discusses some of the challenges that lie ahead.

State of the Art: Word/ChemDraw

Microsoft Word allows 2D molecular graphics, typically created with ChemDraw, to be embedded in documents and later edited. Those images can then be copied into Power Point presentations and reused in a variety of other Windows-specific products. This practice has become so widespread throughout industry and academics, that few chemists even think about the technology that many of them rely on several times a week.

Chemical Structures are Peculiar

A 2D molecular image, like the one depicting fluoxetine at the top of this article, is a peculiar beast. On one level, it's a picture that anybody can look at. But on another level, it's a type of object for which manipulation by humans and computers is extremely useful. The combination of Microsoft Word and ChemDraw lets chemists conveniently manage the dual nature of chemical structures.

Live Molecular Images

Why would anybody want to create editable and searchable 2D molecular graphics such as JPGs, PNGs, and SVGs? Alas, technology has a way of moving on just when we're getting comfortable with it (an especially difficult concept for typewriter manufacturers who went bust during the 1980s, and the dedicated word processor manufacturers who followed).

Consider the number of Word and PowerPoint documents you read last week compared to the number of Web pages. Chances are the ratio is at least 1:10. The trend shows no signs of reversing itself.

Although Web authoring tools have been very slow to reach the average user, the blogging explosion has led to rapid evolution in the field. As tools like WordPress, Movable Type, and even Wikipedia race to satisfy the needs of power authors, the average user will rather unexpectedly discover that they have access to perfectly capable tools that let them abandon their over-engineered (and expensive) word processors to experiment with Web publishing.

The Wikipedia Chemisty/Structure Drawing Workgroup hints at what lies ahead for chemistry. Two tools, GChemPaint and ACD ChemSketch, now enable molecular structure information to be embedded in images.

As chemistry turns to the Web as its primary publication medium, chemists will need the same ability to deal with chemical structures offered by their current tools of choice. In articles to follow, I'll discuss some ways this could happen.

The Journal Deadpool: Failing Business Models and Sick Markets in Scientific Publishing? 3

Posted by Rich Apodaca Fri, 27 Jul 2007 07:44:00 GMT

Several articles in the past few years have alluded to the ongoing cost squeeze faced by librarians maintaining scientific journal collections. Consistently we're told by those doing the buying that subscription costs have gotten way out of control. Sadly, there's only one correct response: kill subscriptions to those journals that price themselves out of the marketplace.

For the most part, canceling journal subscriptions has been a private activity - news doesn't consistently travel beyond the confines of the institution doing it. But what if it did?

Try this thought experiment: you're a journal publisher who has had a number of canceled subscriptions in the last two years. You continue to receive a healthy number of manuscript submissions, yet your revenues have been falling to the point that you may not be able to cover your costs. Do you (a) lower subscription rates to more competitively price your product; (b) keep rates the same, hoping things will turn around; or (c) raise rates to maintain profitability?

There's an interesting hypothesis, variously alluded to, that says that journals increase their subscription rates to remain profitable in the face of declining demand. Classical economics says that declining demand should result in declining prices, but that assumes a healthy, efficiently-functioning marketplace. Scientific publishing today may not be among them.

Could declining subscription rates actually be a major cause of the "increasing costs" faced by scientific publishers and passed on to subscribers? It's a testable hypothesis if the right data have been collected.

To conduct such a study, one piece of information that may be helpful is a market-wide summary of journal subscription cancellations. Let's call it the "Journal Deadpool." Unfortunately, to my knowledge, no such data exist.

By way of the Chemical Information Sources Discussion List, Thurston Donart Miller pointed out that the Physics-Astronomy-Mathematics (PAM) Division of the Special Libraries Association has maintained a list of canceled subscriptions for the last ten years.

The PAM effort is a step in the right direction, but what if they took it further? By including data such as the date of cancellation, the last annual subscription fee, and whether an online subscription to the journal is available at the institution, a much clearer picture of the state of the scientific publishing market would emerge.

A highly-publicized, multi-institution "Journal Deadpool" would certainly give food for thought to scientists considering where to send their next manuscript. And it would strengthen the case of librarians who are caught in the middle.

One of the prerequisites for a healthy marketplace is free flow of information among both buyers and sellers. A Journal Deadpool may be just what the doctor ordered.

Image Credit: geographie

Top Ten Best-Selling Drugs Worldwide (2006)

Posted by Rich Apodaca Wed, 25 Jul 2007 06:40:00 GMT

If you haven't had a chance to do so yet, IMS Health's recent Intelligence.360 on the global pharmaceutical industry is worth reading. One noteworthy set of data contained in the report is a list of the top ten best-selling drugs worldwide for 2006.

A list of chemical names and numbers by itself is not that useful. However, adding chemical structures has a way of prompting better questions and generating many more ideas. In that spirit, I've created an online table of the ten best-selling drugs worldwide for 2006. This table contains the 2D chemical structure, generic name, trade name, global sales in US$, company, and indication for each drug.

A new software package codenamed "Firefly" was used to generate the chemical structures in the table. Firefly is a lightweight 2D editor and rendering library written in Java. A series of articles on Firefly can be found on Depth-First.

This link takes you to the table.

Everything Old is New Again: Wiswesser Line Notation (WLN)

Posted by Rich Apodaca Fri, 20 Jul 2007 08:46:00 GMT

Sometimes, searching through the attic of scientific ideas turns up unexpected treasures. Like old clothing styles that suddenly become fashionable again, the passage of time has a way of making old ideas relevant by supplying new context. Those ideas that once enjoyed widespread popularity followed by complete obscurity are especially interesting. This article talks about one of them and why it may matter again.

Some History

Wiswesser Line-Formula Chemical Notation (WLN) was the most popular of perhaps a dozen actively-used line notations systems during the 1960s and 1970s. Developed by William J. Wiswesser over a period of many years starting in the 1940s, WLN contains a surprising number of modern ideas about chemistry and information. At one point a serious contender for the position now held by IUPAC nomenclature, WLN has become so obscure that few chemists have even heard of it and no modern software can manipulate it. Even finding information on the basic grammar of WLN is difficult: almost all of this documentation is contained in out-of-print books.

A Guide

To my surprise, WLN is both easy to understand and easy to use. As far as canonicalized line notations go, WLN is far easier to comprehend than either InChI or Canonical SMILES. Even more surprisingly, WLN actually meets more than a few of the requirements for the ideal line notation for the Web. I was always struck by claims that high school graduates with little chemistry background could be trained to encode WLN in a few weeks; this now seems very plausible.

My guide is Elbert Smith's short 1968 book The Wiswesser Line-Formula Chemical Notation. I was able to pick up a used copy in excellent condition for under $30.00 from Amazon.

Some Examples

Functional groups, carbon chains, and rings play central roles in WLN. Unlike modern line notations that emphasize atoms, WLN is designed to mirror the way that chemists actually think about chemistry.

Consider acetone:

1V1

The two "1"s stand for saturated one-carbon chains, i.e. methyl groups. The "V" stands for a carbon doubly-bonded to oxygen.

Given nothing more than the above example, the encoding of diethyl ether should be completely clear:

2O2

"O" simply stands for a divalent oxygen atom.

The benzene ring is one of the most ubiquitous functional groups in organic chemistry. Wiswesser knew this and wanted to make it easy to encode aromatic compounds. His solution is simplicity itself. Consider acetophenone:

1VR

The "R" stands for a benzene ring. WLN canonicalization gives it the lowest priority and this is why it appears last.

What about disubstituted aromatics? Consider 4-chloroacetophenone:

GR DV1

The "G" symbol stands for chlorine. The " DV1" stands for the 4-acyl substituent. Here, the "D" denotes the 4-postion. The 3- position would result in " CV1", and the 2- position would give " BV1". The space character means that the character following it should be interpreted as ring locant.

WLN uses a very simple system of canonicalization based on alphanumeric order. Priority increases in the direction: (1) symbols; (2) numbers in numerical order; and (3) letters in alphabetical order (with the exception of R which has lower priority than symbols). Coding generally begins at the substituent assigned the highest priority. This explains why 4-chloroacetophenone is not coded as "1VR DG".

Advantages of WLN

WLN is remarkably compact, especially when compared to SMILES and InChI. For example, consider the InChI for 4-chloroacetophenone, which is eight times longer than the corresponding WLN:

InChI=1/C8H7ClO/c1-6(10)7-2-4-8(9)5-3-7/h2-5H,1H3

Additionally, it's readily apparent to a human observer when a WLN is not properly coded - after all, the language was designed to be both read and written by humans rather than machines. Anyone can look at "GR DV1" and deduce almost instantly that it contains a carbonyl group (V), a phenyl group (R), a chloro group (G), and a methyl group (1).

And if this functional group recognition is easy for humans, it's orders of magnitude easier for machines. It's not difficult at all to imagine very sophisticated and fast molecular query systems that do nothing more than simple processing of the ASCII text contained within WLN strings.

Conclusions

It's very unlikely that WLN will ever be resurrected for the purpose of replacing existing line notations. On the other hand, WLN offers many potentially useful concepts for those creating new line notations. As they say, history doesn't repeat itself, but it frequently rhymes.

Older posts: 1 2 3