A Molecular Language for Modern Chemistry: FlexMol and Axial Chirality
A recent article introduced FlexMol as a molecular language with the unique capability of encoding axial chirality. A previous article showed how E/Z geometrical isomerism is encoded with FlexMol. Using the popular chiral reagent and ligand 1,1'-bi-2-naphthol (BINOL) as an example, this tutorial will illustrate in detail how axial chirality is encoded in FlexMol.
Configuration or Conformation?
In contrast to configurational stereoisomers, conformational stereoisomers can be interconverted through bond rotations. So we'll need to use a conformationWheel to represent stereochemistry in BINOL - just as we did with 2-butene. For more rigorous definitions of these concepts, see the original specification by Dietz.
(R)-BINOL
A FlexMol representation and associated atom numbering scheme (R)-BINOL are show below:
<!-- (R)-BINOL -->
<?xml version="1.0" standalone="yes"?>
<molecule>
<constitution>
<atoms>
<atom id="C0" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C1" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C5" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C6" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C7" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C8" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C9" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C10" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C11" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C12" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C13" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C14" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C15" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C16" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C17" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C18" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C19" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="O20" symbol="O" hydrogens="1" ionization="2"></atom>
<atom id="O22" symbol="O" hydrogens="1" ionization="2"></atom>
</atoms>
<bonding>
<bond source="C0" target="C1" bondingElectrons="2"></bond>
<bond source="C1" target="C2" bondingElectrons="2"></bond>
<bond source="C2" target="C3" bondingElectrons="2"></bond>
<bond source="C3" target="C4" bondingElectrons="2"></bond>
<bond source="C4" target="C5" bondingElectrons="2"></bond>
<bond source="C0" target="C5" bondingElectrons="2"></bond>
<bond source="C0" target="C6" bondingElectrons="2"></bond>
<bond source="C6" target="C7" bondingElectrons="2"></bond>
<bond source="C7" target="C8" bondingElectrons="2"></bond>
<bond source="C8" target="C9" bondingElectrons="2"></bond>
<bond source="C9" target="C1" bondingElectrons="2"></bond>
<bondingSystem bondingElectrons="10">
<connections>
<atomPair source="C0" target="C1"></atomPair>
<atomPair source="C1" target="C2"></atomPair>
<atomPair source="C2" target="C3"></atomPair>
<atomPair source="C3" target="C4"></atomPair>
<atomPair source="C4" target="C5"></atomPair>
<atomPair source="C0" target="C5"></atomPair>
<atomPair source="C0" target="C6"></atomPair>
<atomPair source="C6" target="C7"></atomPair>
<atomPair source="C7" target="C8"></atomPair>
<atomPair source="C8" target="C9"></atomPair>
<atomPair source="C9" target="C1"></atomPair>
</connections>
</bondingSystem>
<bond source="C10" target="C11" bondingElectrons="2"></bond>
<bond source="C11" target="C12" bondingElectrons="2"></bond>
<bond source="C12" target="C13" bondingElectrons="2"></bond>
<bond source="C13" target="C14" bondingElectrons="2"></bond>
<bond source="C14" target="C15" bondingElectrons="2"></bond>
<bond source="C10" target="C15" bondingElectrons="2"></bond>
<bond source="C10" target="C16" bondingElectrons="2"></bond>
<bond source="C16" target="C17" bondingElectrons="2"></bond>
<bond source="C17" target="C18" bondingElectrons="2"></bond>
<bond source="C18" target="C19" bondingElectrons="2"></bond>
<bond source="C19" target="C11" bondingElectrons="2"></bond>
<bondingSystem bondingElectrons="10">
<connections>
<atomPair source="C10" target="C11"></atomPair>
<atomPair source="C11" target="C12"></atomPair>
<atomPair source="C12" target="C13"></atomPair>
<atomPair source="C13" target="C14"></atomPair>
<atomPair source="C14" target="C15"></atomPair>
<atomPair source="C10" target="C15"></atomPair>
<atomPair source="C10" target="C16"></atomPair>
<atomPair source="C16" target="C17"></atomPair>
<atomPair source="C17" target="C18"></atomPair>
<atomPair source="C18" target="C19"></atomPair>
<atomPair source="C19" target="C11"></atomPair>
</connections>
</bondingSystem>
<bond source="C9" target="C19" bondingElectrons="2"></bond>
<bond source="C8" target="O20" bondingElectron="2"></bond>
<bond source="C18" target="O21" bondingElectron="2"></bond>
</bonding>
</constitution>
<conformation>
<conformationWheel>
<gammaSequence source="C19" target="C9">
<connections>
<atomPair source="C9" target="C19"></atomPair>
</connections>
</gammaSequence>
<halfPlane>
<lower atom="C11"></lower>
</halfPlane>
<halfPlane>
<upper atom="C1"></upper>
</halfPlane>
<halfPlane>
<lower atom="C18"></lower>
</halfPlane>
<halfPlane>
<upper atom="C8"></upper>
</halfPlane>
</conformationWheel>
</conformation>
</molecule>
We've elected to represent BINOL's two pi-systems as ten-atom, ten-electron bondingSystems. We could have just as easily represented each naphthalene ring using alternating single/double bonds containing two and four electrons, respectively. For an explanation of multi-atom pi-system bonding in FlexMol, see this article.
The stereochemically-relevant part of this document is contained within the conformation element. A gammaSequence, or conformational axis, is defined along with four non-empty halfPlanes. Notice how the basic structure of this conformation element closely resembles the one for 2-butene.
To better visualize the the conformation element of (R)-BINOL, consider the following diagram:

The conformationWheel defines a conformational axis vector from atom C19 to atom C9. Arranged about this axis in a clockwise fashion are four non-empty halfPlanes. Picking an arbitrary halfPlane to start with, atom C11 is positioned first in the lower half. This is then followed by the next halfPlane, which contains atom C1 in its upper half. The next halfPlane contains atom C18 in the lower half. Finally, atom C8 is located in the last halfPlane's upper half.
This procedure completely specifies the axial chirality of (R)-BINOL. Notice how no arbitrary stereodescriptors or chiral templates were used. Of course, we could derive the Cahn-Ingold-Prelog stereodescriptor of (R), given the right software.
Many representations of the same chiral axis are possible, just as each connection table can be represented in many different ways. For example, we could have started the conformation element with the halfPlane containing atom C1. In this case, the ordering of atoms would be C1, C18, C8, C11. Similarly, the orientation of our chiral axis could have been defined from atom C9 to atom C19. In this case the ordering of halfPlanes would be reversed, and the upper/lower designations would be inverted.
(S)-BINOL
How is (S)-BINOL encoded in FlexMol? As you might expect, completely analogously to the (R) enantiomer:
<!-- snip -->
<conformation>
<conformationWheel>
<gammaSequence source="C19" target="C9">
<connections>
<atomPair source="C9" target="C19"></atomPair>
</connections>
</gammaSequence>
<halfPlane>
<lower atom="C11"></lower>
</halfPlane>
<halfPlane>
<upper atom="C8"></upper>
</halfPlane>
<halfPlane>
<lower atom="C18"></lower>
</halfPlane>
<halfPlane>
<upper atom="C1"></upper>
</halfPlane>
</conformationWheel>
</conformation>
<!-- snip -->
As with (R)-BINOL, we can create a diagram representing the conformationWheel of (S)-BINOL:

Conclusions
As you can see, FlexMol completely encodes axial chirality using just a few basic XML elements, rather than chiral templates or stereodescriptors. These were, in fact, the same elements used to encode alkene geometrical isomerism. This modular approach to stereoisomerism results in an extensible system. Future articles will discuss other forms of stereoisomerism that can be represented in FlexMol, including the all-important tetrahedral stereogenic center.
Ferrocene and Beyond: A Solution to the Molecular Representation Problem
The representation of molecular structure decisively determines the scope of a chemical computer program. Our goal is to provide a versatile computer-oriented molecular structure representation for chemical information storage and retrieval as well as for computer-assisted synthesis design. Structural formulas describe molecular structure on the proper level of abstraction for these applications. ... It is therefore desirable that the computer-oriented representation of molecular structure be as expressive as the structural formulas.
-Andreas Dietz, J. Chem. Inf. Comput. Sci. 1995, 35, 787-802
A recent Depth-First article highlighted the difficulty that existing molecular languages have in communicating the generalized, multi-atom bonding present in metallocenes such a ferrocene. For software and Web services that do not interact with the outside world, the Ferrocene Problem may not be a big deal. But for the growing number that do, the Ferrocene Problem is but the tip of a very large iceberg.
Today's Weird-Looking Molecule is Tomorrow's Molecule of the Month
Consider the problem of axial chirality, such as that present in certain biaryls. None of the molecular languages currently in widespread use (InChI, SMILES, Molfile, or CML) provide a mechanism to faithfully represent and communicate this structural motif. In the 1980s, axial chirality was a novelty. Today it is ubiquitous. Consider this graphical abstract from the current issue of Organic Letters:

If you were asked to create an application capable of distinguishing substituted (R) and (S) binol enantiomers, could you do it? If your system needed to reliably interact with the outside world, could it do so? If you're working with any of the cheminformatics tools currently in widespread use, chances are good that the answers to these questions would be "no".
Do you still think of metallocenes as curiosities studied by a handful of organometallic chemists? Consider this J. Org. Chem. ASAP contents article describing one of the most fundamental transformations in organic chemistry:

The problem only gets worse as concepts like axial and planar chirality are increasingly co-mingled with multi-atom bonding. For example, consider the following graphical abstract, taken from J. Org. Chem. ASAP contents:

These molecules, and many others like them, were used in the context of organic chemistry. Moreover, the papers describing their use were published in widely-respected journals specializing in organic chemistry. Yet dozens of popular cheminformatics tools specifically designed for use with organic chemisty are incapable of faithfully representing the most interesting features of these molecules. In other words, the problem is both real and immediate.
Chemistry relentlessly marches forward, revealing even greater molecular information problems on the horizon. For software to remain relevant, it must be based on tools that are up to the challenge.
A Solution
The system proposed by Dietz offers a solution to nearly all of the bonding and stereochemistry problems of existing molecular languages. As a tradeoff, Dietz's system is significantly more complicated to implement. This places an increased burden on software to make the system as simple and understandable as possible.
Java and XML Implementations
Any specification, if it is to become more than just an academic exercise, requires a software implementation. Fortunately, for Dietz's system both a software implementation and an XML Schema have been developed and are freely-available.
The software implementation can be found in the Java framework Octet. In addition to fully-implementing Dietz's specification, Octet enables ring perception, substructure and query structure matching, breadth-first traversal, and of course, depth-first traversal. Add-on libraries are available for 2-D structure depiction, and Molfile and SMILES input and output. A CDK News article discusses CDKTools, a bridge to the Chemistry Development Kit. Octet remains, to my knowledge, the first and only implementation of the Dietz system.
The first, and to my knowledge only, XML implementation of the Dietz molecular representation system is FlexMol (Flexible Molecular Object Language). A commented W3C schema is distributed with Octet. Browser-ready HTML documentation can be found here, or from the sidebar links under "APIs and Schema Documentation." Octet is able to read and write FlexMol documents, providing an open, end-to-end solution to the problem of representing and transmitting molecules containing "nonstandard" bonding and stereochemistry.
Conclusions
Both FlexMol and Octet are convenient tools for working with the Dietz molecular representation system. Future articles in this series will show how they can be used to solve current, real-world molecular representation problems.


