August 12, 2006

By far the most challenging problem solved by the IBM 704 computer system was the storage of two-dimensional structures of organic compounds in such a way that the file could be searched for any structural fragment or moiety that could be drawn and have the computer print the structure is [sic] such a way that the chemist could recognize it without translating or decoding.

W.H. Waldo J. Chem. Doc. 1962, 2, 1-2

Computers have been used to solve chemical informatics problems for a long time. Hardware and software have changed radically, but surprisingly, many of the most important problems of 1956 are still significant in 2006.

Like many older information industries, chemical informatics has been dominated by a few big players for most of its existence. The recent development of free databases containing millions of chemical structures (for example, PubChem and Zinc), and numerous other factors, are rapidly driving down the costs of obtaining chemical information. Cheaper chemical information, in turn, brings the shortcomings of the status quo into sharp focus.

This blog is about the transformation that is underway in chemical informatics. It will focus on open source software written by others and by myself. The creation and distribution of freely-available chemical datasets will also be discussed. The overriding theme will be of interoperability - software that works well with other software and platform-agnostic data.