Visual Representation of Query Structures

August 31, 2010

As the size of a chemical structure database increases, so does the need for advanced query tools. A common feature of large structure databases is hitset pruning through the use of query atoms and query bonds. But how should the necessary 'query structures' be displayed to chemists?

It turns out that different drawing packages use different methods of depiction. For example, two such packages are the PubChem Sketcher and Symyx/Draw, which you can see above offer different renderings of a sulfonamide-based molecular query structure.

Query structures will be an important new feature of ChemWriter 2, so it's important to provide a display method that's both clear and as expressive as possible. But how should we do this?

A recent paper from Rarey's group describing a new method for the display of query structures caught my eye. Like IUPAC's Graphical Representation Standards for Structure Diagrams from which it draws inspiration, this new study seeks to offer specific guidelines for visual structure representation, but from the perspective of molecular queries.

The authors report an intriguing system for the graphical representation of query structure elements, including those for atoms:

and bonds:

A SMARTS query viewer based on these principles is now available, as are a number of examples - in the paper itself and online. The paper's supplementary material contains both graphical legends and examples that help explain the new system.

An open question at this point is to what extent chemists will feel comfortable with the Rarey query display system. Although some motifs such as ring membership will look familiar, others such as red color coding for the "NOT" boolean operator will be less so.

On the other hand, it could be argued that any system that attempts to display query structures in all their detail to chemists will encounter the same problem; the work by Rarey's group simply attempts to provide a standardized way for doing so that harnesses the power of modern computer graphics.