Cheminformatics, Crowds, and Cha-Ching

March 20, 2009

Although this scaling might not seem very illuminating, it actually tells us that chemistry has the so-called scale-free structure similar to that of the World Wide Web, the Internet, metabolic networks and even societies. This scale-free architecture is akin to a fractal in the sense that the structural/connectivity motifs characterizing the entire network repeat themselves in all of its sub-networks. Another distinguishing feature of being scale-free is the presence of highly connected 'hub' molecules directly analogous to the hubs of the airline system (Atlanta, Chicago, London, Frankfurt and so on) facilitating transportation from one poorly connected airport to another. ... The more times a molecule has been used as a synthetic substrate (that is, the larger its kout) in the past, the higher the chances it will be used again in the future; similarly, the higher its kin, the more likely it is that chemists will try to make it by a new reaction. Colloquially speaking, molecular 'celebrities' are becoming ever more popular.

-Grzybowski, Bishop, Kowalczyk, and Wilmer in The 'wired' universe of organic chemistry from Nature Chemistry

This second paper on the subject of power laws and synthetic organic chemistry (previous paper) is chock full of interesting and potentially useful observations on the principles behind the art, science, and business of synthetic organic chemistry.

Whether you're a chemical supplier looking to optimize return on investment or an assistant professor struggling to do 'relevant' research, this article is a must-read.

Take this, for example:

Suppose a speciality-chemicals company produces P products. A relevant question one might ask is then what set of substrates, S, and reaction pathways should the company use to minimize its overall production cost? ... The link between this general formulation and the architecture of the network is the correlation between the cost of a substrate and its local network connectivity. Analysis of specific substances reveals that synthetically popular substances are less expensive than poorly connected ones, with the cost per mole being proportional to the inverse square root of the molecule's network connectivity ... . Using this cost relation, stochastic search algorithms (based on simulated annealing Monte Carlo optimization) can be back-propagated from the products to find optimal substrates, which minimize the total production costs, Ctot. ...

This kind of analysis is groundbreaking because it uses cheminformatics to link mathematical models used for years in other fields with the business and science of organic chemistry. It's a perspective rarely taken and long overdue.