Chemical Substructure Search in SQL 6
Earlier this year, Golovin and Henrick published a novel technique for performing substructure search on large databases using SQL. What makes this approach unique is that it takes advantage of the built-in breadth-first search capabilities of most database engines to do structure matching. At least three public-facing sites now use the system, including this one at EMBL-EBI.
Very recently, Charlie Zhu released an implementation of the idea called sqlmol. As you can see, the source consists of sql files.
I'm not skilled enough with SQL to know how to use sqlmol. Questions in my mind at this point mainly revolve around performance and the ability to do SMARTS queries. The Golovin and Henrick paper does make some mention of performance, but my sense is there's more to look at here.
With an open source implementation apparently now being developed, it should be possible to answer these and many other questions.
Reading SMILES with MX
The latest release of MX, the Java toolkit for cheminformatics, now supports reading a subset of SMILES strings. Although incomplete, full support for this feature is planned within a few releases.
To get an idea of how to use the new SMILES reader, we can use interactive JRuby. Assuming we've downloaded the mx-0.105.0 jarfile to our working directory, we can use:
$ jirb irb(main):001:0> require 'mx-0.105.0.jar' => true irb(main):002:0> import com.metamolecular.mx.io.daylight.SMILESReader => Java::ComMetamolecularMxIoDaylight::SMILESReader irb(main):003:0> bromobenzene = SMILESReader.read 'C1=CC=CC=C1Br' => #<Java::ComMetamolecularMxModel::DefaultMolecule:0x8a2023 @java_object=com.metamolecular.mx.model.DefaultMolecule@182a70> irb(main):004:0> bromobenzene.count_atoms => 7 irb(main):005:0> bromobenzene.get_atom(6).get_symbol => "Br"
Extending InChI Stereochemistry 6
As covered by Reuters and many other wire services, ArtusLabs and Boston University's CMLD have teamed up to extend InChI's stereochemistry support:
DURHAM, N.C.--(Business Wire)-- ArtusLabs, Inc., a leading provider of life science software tools and data management solutions, has entered into a partnership with Boston University's Center for Chemical Methodology and Library Development (CMLD) to develop a way to standardize and expand the way in which stereochemistry, and ultimately a three-dimensional structures, are represented in the International Chemical Identifier (InChI(TM)).
With the increasing use of molecules containing axial chirality , planar chirality and other forms of non-tetrahedral stereogenicity in chemistry, the move by ArtusLabs and CMLD could be significant.
Put simply, the ability of cheminformatics to represent certain kinds of compounds has fallen way behind the ability of chemistry to make them. While molecules once considered mere oddities 30 years ago continue to pour into corporate compound collections, laboratory notebooks, and product catalogs, cheminformatics has been stuck with a form of molecular representation that hasn't changed significantly in several decades.
InChI isn't alone. All three of the most widely-used molecular representation systems now in use (Molfile, SMILES, and CML) suffer from fundamental limitations in representing axial chirality, planar chirality, and multicenter bonding.
The kind of work being undertaken by ArtusLabs and CMLD is essential if cheminformatics is to continue to keep pace with new developments in chemistry.
Science Blogging Anthology Now in Print 2
The science blogging anthology The Open Laboratory 2007 is now available for purchase. As mentioned earlier, The Open Laboratory was created to promote the 2008 North Carolina Science Blogging Conference to be held on January 19, 2008. Chapter 4.3 contains the article "SMILES and Aromaticity: Broken?", which originally appeared last year on Depth-First. Details are available in the original announcement.
The Open Laboratory's publisher is remarkable. Lulu is a service that lets people of average means publish and sell their own books. The key to the entire operation is that rather than being printed in large batches, books are printed on demand.
Got a great idea for a book that will likely have a devoted but small audience? You too can publish a high-quality product and sell it through an established, worldwide distribution network. No contracts, no agents, no years of trying to find a publisher. Just do it.
Consider these chemistry-related titles currently offered by Lulu, none of which has the mass market needed to get a major publisher to back them:
Having bought one Lulu title recently, Desktop Java Live, I can say that both the experience and finished product are nearly indistinguishable from buying books at Amazon.
Let's hear it for The Long Tail!
Depth-First Article to Appear in Science Blogging Anthology
A recent Depth-First article titled "SMILES and Aromaticity: Broken?" has been selected to appear in the science blogging anthology "The Open Laboratory 2007." This article, along with the 51 other winning entries, will be published as a book that can be purchased from Amazon.com. The book, the second in a series, is aimed at promoting the 2008 North Carolina Science Blogging Conference to be held on January 19, 2008.
Are science blogging anthologies like Open Laboratory just a passing fad, or the beginning of something much larger? Only time will tell. What's clear is that the means of production and distribution of scientific information are getting cheaper by the year, resulting in an increasingly large range of choices for readers. If other communication-related industries such as movies, music, software, and newspapers offer any indication of what lies ahead, small may well be the new big in scientific publication - and not a moment too soon.

