Gigabytes of Chemical Information - Now Free for Download

June 03, 2010

As announced on their Public Policy Blog, Google has made available for free download both images and full text of a large number of US Patents and Patent Applications.

This information has always been available - but at a price. The US Patent and Trademark Office (USPTO) has offered the ability to subscribe to new patent files, as well as their collection of backfiles, in DVD format for a price ranging in the thousands of dollars. A few companies offer similar collections at similarly inflated prices.

Who cares about patents, you say? It turns out that patents are the only large corpus of chemical information that can be freely reused without restriction. It's hard to overemphasize how important this quality is because most of the peer-reviewed chemical literature is locked away in a vast, balkanized, for-profit information silo that is the chemistry publication system we all know. This applies equally to supplementary experimental material and the peer-reviewed text. It's impossible for average researchers with average means to do anything productive with that corpus as a whole.

With patents, these limitations don't apply.

What's contained in chemistry patents? Anything with a basis in chemistry that's ever been sold (or even considered marketable) has been through a patent process: drugs; organic LEDs; photoresists; liquid crystals; catalysts; detergents; and so on. Experimental procedures, assay results, and physical properties can all be found in chemistry patents - if you know where to look.

Patents are useful for other reasons as well. For example, much of the best industrial research generally never gets published in the peer-reviewed literature. It can only be found in patents.

Chemistry patents also have a big downside: poor use of conventions. Chemical naming, for example, can be haphazard at best and a lot of information must be inferred. A well-stocked chemical dictionary and name-to-structure software are required equipment for fishing in these waters.

I'm just now starting to work my way through the patent material that's been made available from Google. It'll be interesting to see what new things might come from this vast collection of chemical information.