A Molecular Language for Modern Chemistry: FlexMol and Axial Chirality
A recent article introduced FlexMol as a molecular language with the unique capability of encoding axial chirality. A previous article showed how E/Z geometrical isomerism is encoded with FlexMol. Using the popular chiral reagent and ligand 1,1'-bi-2-naphthol (BINOL) as an example, this tutorial will illustrate in detail how axial chirality is encoded in FlexMol.
Configuration or Conformation?
In contrast to configurational stereoisomers, conformational stereoisomers can be interconverted through bond rotations. So we'll need to use a conformationWheel to represent stereochemistry in BINOL - just as we did with 2-butene. For more rigorous definitions of these concepts, see the original specification by Dietz.
(R)-BINOL
A FlexMol representation and associated atom numbering scheme (R)-BINOL are show below:
<!-- (R)-BINOL -->
<?xml version="1.0" standalone="yes"?>
<molecule>
<constitution>
<atoms>
<atom id="C0" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C1" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C5" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C6" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C7" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C8" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C9" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C10" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C11" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C12" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C13" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C14" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C15" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C16" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C17" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C18" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="C19" symbol="C" hydrogens="0" ionization="4"></atom>
<atom id="O20" symbol="O" hydrogens="1" ionization="2"></atom>
<atom id="O22" symbol="O" hydrogens="1" ionization="2"></atom>
</atoms>
<bonding>
<bond source="C0" target="C1" bondingElectrons="2"></bond>
<bond source="C1" target="C2" bondingElectrons="2"></bond>
<bond source="C2" target="C3" bondingElectrons="2"></bond>
<bond source="C3" target="C4" bondingElectrons="2"></bond>
<bond source="C4" target="C5" bondingElectrons="2"></bond>
<bond source="C0" target="C5" bondingElectrons="2"></bond>
<bond source="C0" target="C6" bondingElectrons="2"></bond>
<bond source="C6" target="C7" bondingElectrons="2"></bond>
<bond source="C7" target="C8" bondingElectrons="2"></bond>
<bond source="C8" target="C9" bondingElectrons="2"></bond>
<bond source="C9" target="C1" bondingElectrons="2"></bond>
<bondingSystem bondingElectrons="10">
<connections>
<atomPair source="C0" target="C1"></atomPair>
<atomPair source="C1" target="C2"></atomPair>
<atomPair source="C2" target="C3"></atomPair>
<atomPair source="C3" target="C4"></atomPair>
<atomPair source="C4" target="C5"></atomPair>
<atomPair source="C0" target="C5"></atomPair>
<atomPair source="C0" target="C6"></atomPair>
<atomPair source="C6" target="C7"></atomPair>
<atomPair source="C7" target="C8"></atomPair>
<atomPair source="C8" target="C9"></atomPair>
<atomPair source="C9" target="C1"></atomPair>
</connections>
</bondingSystem>
<bond source="C10" target="C11" bondingElectrons="2"></bond>
<bond source="C11" target="C12" bondingElectrons="2"></bond>
<bond source="C12" target="C13" bondingElectrons="2"></bond>
<bond source="C13" target="C14" bondingElectrons="2"></bond>
<bond source="C14" target="C15" bondingElectrons="2"></bond>
<bond source="C10" target="C15" bondingElectrons="2"></bond>
<bond source="C10" target="C16" bondingElectrons="2"></bond>
<bond source="C16" target="C17" bondingElectrons="2"></bond>
<bond source="C17" target="C18" bondingElectrons="2"></bond>
<bond source="C18" target="C19" bondingElectrons="2"></bond>
<bond source="C19" target="C11" bondingElectrons="2"></bond>
<bondingSystem bondingElectrons="10">
<connections>
<atomPair source="C10" target="C11"></atomPair>
<atomPair source="C11" target="C12"></atomPair>
<atomPair source="C12" target="C13"></atomPair>
<atomPair source="C13" target="C14"></atomPair>
<atomPair source="C14" target="C15"></atomPair>
<atomPair source="C10" target="C15"></atomPair>
<atomPair source="C10" target="C16"></atomPair>
<atomPair source="C16" target="C17"></atomPair>
<atomPair source="C17" target="C18"></atomPair>
<atomPair source="C18" target="C19"></atomPair>
<atomPair source="C19" target="C11"></atomPair>
</connections>
</bondingSystem>
<bond source="C9" target="C19" bondingElectrons="2"></bond>
<bond source="C8" target="O20" bondingElectron="2"></bond>
<bond source="C18" target="O21" bondingElectron="2"></bond>
</bonding>
</constitution>
<conformation>
<conformationWheel>
<gammaSequence source="C19" target="C9">
<connections>
<atomPair source="C9" target="C19"></atomPair>
</connections>
</gammaSequence>
<halfPlane>
<lower atom="C11"></lower>
</halfPlane>
<halfPlane>
<upper atom="C1"></upper>
</halfPlane>
<halfPlane>
<lower atom="C18"></lower>
</halfPlane>
<halfPlane>
<upper atom="C8"></upper>
</halfPlane>
</conformationWheel>
</conformation>
</molecule>
We've elected to represent BINOL's two pi-systems as ten-atom, ten-electron bondingSystems. We could have just as easily represented each naphthalene ring using alternating single/double bonds containing two and four electrons, respectively. For an explanation of multi-atom pi-system bonding in FlexMol, see this article.
The stereochemically-relevant part of this document is contained within the conformation element. A gammaSequence, or conformational axis, is defined along with four non-empty halfPlanes. Notice how the basic structure of this conformation element closely resembles the one for 2-butene.
To better visualize the the conformation element of (R)-BINOL, consider the following diagram:

The conformationWheel defines a conformational axis vector from atom C19 to atom C9. Arranged about this axis in a clockwise fashion are four non-empty halfPlanes. Picking an arbitrary halfPlane to start with, atom C11 is positioned first in the lower half. This is then followed by the next halfPlane, which contains atom C1 in its upper half. The next halfPlane contains atom C18 in the lower half. Finally, atom C8 is located in the last halfPlane's upper half.
This procedure completely specifies the axial chirality of (R)-BINOL. Notice how no arbitrary stereodescriptors or chiral templates were used. Of course, we could derive the Cahn-Ingold-Prelog stereodescriptor of (R), given the right software.
Many representations of the same chiral axis are possible, just as each connection table can be represented in many different ways. For example, we could have started the conformation element with the halfPlane containing atom C1. In this case, the ordering of atoms would be C1, C18, C8, C11. Similarly, the orientation of our chiral axis could have been defined from atom C9 to atom C19. In this case the ordering of halfPlanes would be reversed, and the upper/lower designations would be inverted.
(S)-BINOL
How is (S)-BINOL encoded in FlexMol? As you might expect, completely analogously to the (R) enantiomer:
<!-- snip -->
<conformation>
<conformationWheel>
<gammaSequence source="C19" target="C9">
<connections>
<atomPair source="C9" target="C19"></atomPair>
</connections>
</gammaSequence>
<halfPlane>
<lower atom="C11"></lower>
</halfPlane>
<halfPlane>
<upper atom="C8"></upper>
</halfPlane>
<halfPlane>
<lower atom="C18"></lower>
</halfPlane>
<halfPlane>
<upper atom="C1"></upper>
</halfPlane>
</conformationWheel>
</conformation>
<!-- snip -->
As with (R)-BINOL, we can create a diagram representing the conformationWheel of (S)-BINOL:

Conclusions
As you can see, FlexMol completely encodes axial chirality using just a few basic XML elements, rather than chiral templates or stereodescriptors. These were, in fact, the same elements used to encode alkene geometrical isomerism. This modular approach to stereoisomerism results in an extensible system. Future articles will discuss other forms of stereoisomerism that can be represented in FlexMol, including the all-important tetrahedral stereogenic center.
The Axial Chirality Problem
... To discover high-performance asymmetric catalysts, developing an excellent chiral ligand is crucial. Attracted by its molecular beauty[Chemica Scripta 1985, 25, 83], we initiated the synthesis of BINAP (2,2'-bis(diphenylphosphino)-1,1'-binaphthyl)[J. Am. Chem. Soc. 1980, 102, 1932] in 1974 at Nagoya with the help of H. Takaya, my respected long-term collaborator. BINAP was a new fully aromatic, axially dissymmetric C2 chiral diphosphine that would exert strong steric and electronic influences on transition metal complexes. Its properties could be fine-tuned by substitutions on the aromatic rings. ...
-Ryoji Noyori, Nobel Lecture, December 8, 2001
Axial chirality results, not from a tetrahedral chiral center, but from a chiral axis. This form of chirality most frequently occurs in biaryls and allenes. The importance of axial chirality to organic chemistry was recognized in 2001, when Ryoji Noyori was co-awarded the Nobel Prize in Chemistry, in part for his work with highly selective catalysts derived from the axially-chiral BINAP ligand.

Since the early 1980s, axial chirality has played an increasingly significant role in organic chemistry. Much of this research has focused on catalysis; consider two recent reviews, one on modified BINOLs, and one on modified BINAPs. But axial chirality isn't just restricted to catalysts; it's also a feature of numerous natural products.
Once merely a curiosity, axial chirality now plays a role in virtually every subdiscipline of organic chemistry. At the same time, this important concept is alien to most molecular languages and toolkits. Consider, for example, that the specifications of all four of the most popular molecular languages (SMILES, InChI, Molfile, and CML) are silent on the representation of axial chirality. In other words, axial chirality is undefined in these languages. Although support for axial chirality could be "hacked" into these languages, this would require nonstandard conventions that would be unintelligible to any third party.
This situation poses a significant problem for those needing to discriminate axially chiral stereoisomers in molecular databases or other applications. For example, PubChem's entry on the axially-chiral drug gossypol is devoid of stereochemical information. If PubChem used an internal representation of molecular structure capable of encoding axial chirality, coupled with a suitable molecular language to be used by depositors, separate entries for each gossypol enantiomer would be feasible. After all, PubChem users have come to expect the same of other chiral drugs containing stereogenic atoms.
To address this problem, a new XML-based molecular language called FlexMol has been developed. Recent articles have highlighted FlexMol's use with the multi-atom bonding found in metallocenes, and E/Z alkene geometrical isomerism. Based on a specification by Andreas Dietz, Flexmol can represent all forms of axial chirality using a single flexible formalism..
Chemical informatics is beginning to embrace the concepts of Open Source and Open Data already in widespread use elsewhere. This shift will bring into sharp focus the need for robust and open methods for accurately encoding molecular structure. Existing technologies have not kept up with the chemists themselves, as the axial chirality problem demonstrates. Future articles in this series will show how FlexMol can offer a solution to this and other important molecular representation problems.
A Molecular Language for Modern Chemistry: FlexMol and Alkene Geometrical Isomerism
The fundamental idea behind our representation of stereochemistry is to really describe the relative spatial arrangements of the atoms of a chemical structure. For a given constitution, we obtain a unique and unambiguous stereochemical representation. No limitation to predefined types or steregenic units exits; any conceivable relative spatial arrangement of atoms may be uniformly represented by one universally applicable formalism. ...
-Andreas Dietz, J. Chem. Inf. Comput. Sci. 1995, 35, 787-802
A recent article introduced FlexMol, a molecular language designed to encode the multi-atom bonding arrangements present in molecules being increasingly made and used by today's chemists. But FlexMol was designed with much more than bonding in mind. Of all of the difficult areas in molecular representation, perhaps none are more daunting than stereochemistry. This article will introduce the basic ideas behind FlexMol's stereochemistry capabilities using the geometrical isomers of 2-butene as an example.
The Difference Between Configuration and Conformation
FlexMol distinguishes between two complementary stereochemical concepts - conformation and configuration. The difference between the two lies in whether isomers can be interconverted through bond rotations. To paraphrase Dietz:
Conformation. Two molecules with identical atom connectivities and bonding differ with respect to their conformation if if they possess different relative spatial arrangements of atoms that can be interconverted by rotations about bonds.
Configuration. Two molecules with identical atom connectivities and bonding differ with respect to their configuration if they possess different relative spatial arrangements of atoms that can not be interconverted by rotations about bonds.
Notice that these definitions say nothing about whether a bond rotation is likely to occur; they simply refer to the possibility of isomer interconversion through bond rotation. Clearly, double bond geometrical isomerism arises from restricted bond rotation. So we'll be relying on FlexMol's support for conformational stereochemistry.
Encoding Cis/Trans Isomerism: 2-Butene
Consider the two isomers of 2-butene. The cis isomer can be encoded in FlexMol as follows:
<!-- cis-2-butane -->
<?xml version="1.0" standalone="yes"?>
<molecule>
<constitution>
<atoms>
<atom id="C0" symbol="C" hydrogens="3" ionization="4"></atom>
<atom id="C1" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C3" symbol="C" hydrogens="3" ionization="4"></atom>
</atoms>
<bonding>
<bond source="C0" target="C1" bondingElectrons="2"></bond>
<bond source="C1" target="C2" bondingElectrons="4"></bond>
<bond source="C2" target="C3" bondingElectrons="2"></bond>
<bond source="C3" target="C4" bondingElectrons="2"></bond>
</bonding>
</constitution>
<conformation>
<conformationWheel>
<gammaSequence source="C1" target="C2">
<connections>
<atomPair source="C1" target="C2"></atomPair>
</connections>
</gammaSequence>
<halfPlane></halfPlane>
<halfPlane>
<lower atom="C0"></lower>
<upper atom="C3"></upper>
</halfPlane>
</conformationWheel>
</conformation>
</molecule>In contrast to previous FlexMol examples, this representation contains a conformation element, which in turn contains a conformationWeel subelement. The conformationWheel is composed of a gammaSequence and two halfPlanes. The relationship among these elements can be seen in the diagram below.

Stereochemical representation in FlexMol boils down to placing atoms into a set of half-planes intersecting a given axis (Dietz refers to this arrangement as a "pencil of planes"). In the case of cis-2-butene, this axis, or gamma sequence, is the atom pair between atoms C1 and C2. A gamma sequence can consist of two or more atoms, a very useful feature for representing allene stereochemistry, for example. Half planes are specified in clockwise order about this axis. Because half planes always occur in pairs separated by 180 degrees about their common axis, the number of half planes will always be even. Each conformational half plane is further subdivided into two regions labeled appropriately enough "upper" and "lower".
A conformation wheel will always have an equivalent, but opposite representation. For example, cis-2-butene could also be represented with an axis of opposite orientation (C2->C1), opposite ordering of half planes (in this case the same ordering because there are only two half planes), and inverted upper/lower designations. FlexMol only requires that one of these two equivalent arrangements be specified.
In a similar fashion, we can generate a FlexMol representation for trans-2-butene:
<!-- trans-2-butane -->
<?xml version="1.0" standalone="yes"?>
<molecule>
<constitution>
<atoms>
<atom id="C0" symbol="C" hydrogens="3" ionization="4"></atom>
<atom id="C1" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C3" symbol="C" hydrogens="3" ionization="4"></atom>
</atoms>
<bonding>
<bond source="C0" target="C1" bondingElectrons="2"></bond>
<bond source="C1" target="C2" bondingElectrons="4"></bond>
<bond source="C2" target="C3" bondingElectrons="2"></bond>
<bond source="C3" target="C4" bondingElectrons="2"></bond>
</bonding>
</constitution>
<conformation>
<conformationWheel>
<gammaSequence source="C1" target="C2">
<connections>
<atomPair source="C1" target="C2"></atomPair>
</connections>
</gammaSequence>
<halfPlane>
<upper atom="C3"></upper>
</halfPlane>
<halfPlane>
<lower atom="C0"></lower>
</halfPlane>
</conformationWheel>
</conformation>
</molecule>This representation contains a conformationWheel with two filled half planes containing the atoms C3 and C0, respectively. The arrangement among the conformational elements can be better seen in the following diagram:

So What?
There are many ways to represent alkene geometrical isomerism, most of which are far simpler than the one outlined here. So what does this additional complexity buy us? In FlexMol, we can use exactly the same formalisms we used for 2-butene isomers to represent the stereochemistries of molecules that simply can not be represented in other systems. Two specific examples include the axial chirality of allenes and biaryls. If you'd like some hints on how to accomplish this, see the allene and binaphthyl FlexMol examples contained in the flexmol directory of the Octet source distribution.
Notice how FlexMol does away with the need to define conformation in terms of sterochemical descriptors, which are quite limited. Instead, FlexMol provides a small set of modular concepts that, when used together, actually describe the underlying conformational features of a molecule. Of course, (E) and (Z) descriptors (and a host of others as well) can be derived from a FlexMol representation given the right software.
Conclusions
We've covered the essentials for conformational representation in FlexMol, and we've seen how to differentiate double bond geometrical isomers. The same principles described here are also used in encoding stereochemical configuration, which will be the subject of a future tutorial.
A Molecular Language for Modern Chemistry: Getting Started with FlexMol
Existing molecular languages are limited in their ability to represent such commonplace features as multi-center bonding and axial chirality. The practical outcome of these limitations can be seen in PubChem's four separate entries for ferrocene and the inability to fully represent many molecules now in common use by organic chemists.
A recent article touched on a molecular representation system that was capable of far greater expressive power than those currently in use. In this article, I'll introduce FlexMol, an XML implementation of this advanced molecular representation system.
What is FlexMol?
FlexMol is an XML-based molecular language that's designed to allow the faithful representation of any molecule, regardless of its peculiarities. The following is a list of features that FlexMol can encode:
Multi-atom, multi-electron bonds
All known forms of stereochemistry, including axial chirality (e.g., allenes and biarlys), planar chirality (e.g., metallocenes), and non-tetrahedral stereocenters (e.g., square planar and octahedral metal complexes)
Non-natural isotopic distributions and pure isotopes
Virtual hydrogens (similar to "implicit hydrogens") through mandatory, explicit enumeration
Electronic spin, enabling the differentiation of spin states
What Does FlexMol Look Like?
Let's start with the simple example of benzene:
<!-- Benzene, represented as "1,3,5-cyclohexatriene" -->
<?xml version="1.0" standalone="yes"?>
<molecule>
<constitution>
<atoms>
<atom id="C0" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C1" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C5" symbol="C" hydrogens="1" ionization="4"></atom>
</atoms>
<bonding>
<bond source="C0" target="C1" bondingElectrons="2"></bond>
<bond source="C1" target="C2" bondingElectrons="4"></bond>
<bond source="C2" target="C3" bondingElectrons="2"></bond>
<bond source="C3" target="C4" bondingElectrons="4"></bond>
<bond source="C4" target="C5" bondingElectrons="2"></bond>
<bond source="C0" target="C5" bondingElectrons="4"></bond>
</bonding>
</constitution>
</molecule>The above representation divides the structure of benzene into two main elements - atoms and bonding. Both of these elements are in turn subelements of the constitution element, which specifies atom connectivity. Had we been representing a molecule with stereochemical features, the above document could have also contained a configuration element, a conformation element, or both.
Within the atoms element are definitions for each of the six degenerate carbon atoms of benzene. Each atom is assigned a unique ID for use elsewhere in the document, an atomic symbol, the number of hydrogens bonded to each atom, and the effective ionization state of each atom. The mandatory hydrogens attribute specifies "virtual" hydrogens, or those associated with an atom without being full-fledged nodes in the graph representation.
The bonding element defines all of the bonding arrangements within benzene. In this case, benzene is being represented as "cyclohexatriene" with alternating single and double bonds; below we'll see how to use FlexMol to represent delocalized (aromatic) bonding. Each bond specifies a source atom, a target atom, and the number of bonding electrons.
In many situations, the above representation of benzene will not suffice. What if we want to describe the one-electron ionization of benzene to form the benzene radical cation? Using the "cyclohexatriene" form of benzene makes it impossible to select the correct bond from which to take electrons.
Instead, we could use a more physically meaningful representation of benzene, such as that shown below:
<!-- Benzene, represented with a delocalized pi-system -->
<?xml version="1.0" standalone="yes"?>
<molecule>
<constitution>
<atoms>
<atom id="C0" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C1" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C5" symbol="C" hydrogens="1" ionization="4"></atom>
</atoms>
<bonding>
<bond source="C0" target="C1" bondingElectrons="2"></bond>
<bond source="C1" target="C2" bondingElectrons="2"></bond>
<bond source="C2" target="C3" bondingElectrons="2"></bond>
<bond source="C3" target="C4" bondingElectrons="2"></bond>
<bond source="C4" target="C5" bondingElectrons="2"></bond>
<bond source="C0" target="C5" bondingElectrons="2"></bond>
<bondingSystem bondingElectrons="6">
<connections>
<atomPair source="C0" target="C1"></atomPair>
<atomPair source="C1" target="C2"></atomPair>
<atomPair source="C2" target="C3"></atomPair>
<atomPair source="C3" target="C4"></atomPair>
<atomPair source="C4" target="C5"></atomPair>
<atomPair source="C0" target="C5"></atomPair>
</connections>
</bondingSystem>
</bonding>
</constitution>
</molecule>This is certainly more verbose, but what does it buy us? Notice the bondingSystem subelement at the end of the bonding element. Here we define an extended six-atom, six-electron bonding system that much more closely reflects the true nature of benzene's pi-system. Now it's obvious that this is the bonding motif from which to take an electron to make the benzene radical cation.
Next, consider the cyclopenadienyl anion, which possesses a five-atom, six-electron Hueckel aromatic bonding system. We can apply the same principles in representing benzene's pi-system to the representation of the cyclopentadienyl anion's pi-bonding:
<!-- Cyclopentadienyl Anion -->
<?xml version="1.0" standalone="yes"?>
<molecule>
<constitution>
<atoms>
<atom id="C0" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C1" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
</atoms>
<bonding>
<bond source="C0" target="C1" bondingElectrons="2"></bond>
<bond source="C1" target="C2" bondingElectrons="2"></bond>
<bond source="C2" target="C3" bondingElectrons="2"></bond>
<bond source="C3" target="C4" bondingElectrons="2"></bond>
<bond source="C0" target="C4" bondingElectrons="2"></bond>
<bondingSystem bondingElectrons="6">
<connections>
<atomPair source="C0" target="C1"></atomPair>
<atomPair source="C1" target="C2"></atomPair>
<atomPair source="C2" target="C3"></atomPair>
<atomPair source="C3" target="C4"></atomPair>
<atomPair source="C0" target="C4"></atomPair>
</connections>
</bondingSystem>
</bonding>
</constitution>
</molecule>In the above representation, all carbon atoms are equivalent - something difficult, if not impossible, to achieve with most other molecular languages. Furthermore, the representation of delocalized bonding closely matches what most chemists would describe. We could get even more sophisticated and place individual electrons into three separate bonding systems in analogy with molecular orbitals - it really depends on what we'd like to emphasize.
This is well and good for aromaticity, but how can all of this help solve the Ferrocene Problem? Just as with cyclopentadienyl anion and benzene, in the representation of ferrocene below, we're taking advantage of FlexMol's support for multi-atom bonding. In this case, we define three bondingSystems, each of which contain six electrons. We could have just as easily created a single eighteen-electron, eleven-atom bonding system. Our choice of representation again depends on what we're trying to emphasize.
<!-- Ferrocene -->
<?xml version="1.0" standalone="yes"?>
<molecule>
<constitution>
<atoms>
<atom id="C0" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C1" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C5" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C6" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C7" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C8" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="C9" symbol="C" hydrogens="1" ionization="4"></atom>
<atom id="Fe10" symbol="Fe" hydrogens="0" ionization="8"></atom>
</atoms>
<bonding>
<bond source="C0" target="C1" bondingElectrons="2"></bond>
<bond source="C1" target="C2" bondingElectrons="2"></bond>
<bond source="C2" target="C3" bondingElectrons="2"></bond>
<bond source="C3" target="C4" bondingElectrons="2"></bond>
<bond source="C0" target="C4" bondingElectrons="2"></bond>
<bond source="C5" target="C6" bondingElectrons="2"></bond>
<bond source="C6" target="C7" bondingElectrons="2"></bond>
<bond source="C7" target="C8" bondingElectrons="2"></bond>
<bond source="C8" target="C9" bondingElectrons="2"></bond>
<bond source="C5" target="C9" bondingElectrons="2"></bond>
<bondingSystem bondingElectrons="6">
<connections>
<atomPair source="C0" target="C1"></atomPair>
<atomPair source="C1" target="C2"></atomPair>
<atomPair source="C2" target="C3"></atomPair>
<atomPair source="C3" target="C4"></atomPair>
<atomPair source="C0" target="C4"></atomPair>
<atomPair source="C0" target="Fe10"></atomPair>
<atomPair source="C1" target="Fe10"></atomPair>
<atomPair source="C2" target="Fe10"></atomPair>
<atomPair source="C3" target="Fe10"></atomPair>
<atomPair source="C4" target="Fe10"></atomPair>
</connections>
</bondingSystem>
<bondingSystem bondingElectrons="6">
<connections>
<atomPair source="C5" target="C6"></atomPair>
<atomPair source="C6" target="C7"></atomPair>
<atomPair source="C7" target="C8"></atomPair>
<atomPair source="C8" target="C9"></atomPair>
<atomPair source="C5" target="C9"></atomPair>
<atomPair source="C5" target="Fe10"></atomPair>
<atomPair source="C6" target="Fe10"></atomPair>
<atomPair source="C7" target="Fe10"></atomPair>
<atomPair source="C8" target="Fe10"></atomPair>
<atomPair source="C9" target="Fe10"></atomPair>
</connections>
</bondingSystem>
<bondingSystem bondingElectrons="6">
<connections>
<atomPair source="C0" target="C1"></atomPair>
<atomPair source="C1" target="C2"></atomPair>
<atomPair source="C2" target="C3"></atomPair>
<atomPair source="C3" target="C4"></atomPair>
<atomPair source="C0" target="C4"></atomPair>
<atomPair source="C0" target="Fe10"></atomPair>
<atomPair source="C1" target="Fe10"></atomPair>
<atomPair source="C2" target="Fe10"></atomPair>
<atomPair source="C3" target="Fe10"></atomPair>
<atomPair source="C4" target="Fe10"></atomPair>
<atomPair source="C5" target="C6"></atomPair>
<atomPair source="C6" target="C7"></atomPair>
<atomPair source="C7" target="C8"></atomPair>
<atomPair source="C8" target="C9"></atomPair>
<atomPair source="C5" target="C9"></atomPair>
<atomPair source="C5" target="Fe10"></atomPair>
<atomPair source="C6" target="Fe10"></atomPair>
<atomPair source="C7" target="Fe10"></atomPair>
<atomPair source="C8" target="Fe10"></atomPair>
<atomPair source="C9" target="Fe10"></atomPair>
</connections>
</bondingSystem>
</bonding>
</constitution>
</molecule>The same principles outlined for ferrocene apply equally to other metallocenes. FlexMol can also represent a host of otherwise tough cases such as nonclassical carbocations, allylmetal complexes, resonance-stabilized radicals and ions, and transition states.
Why XML?

XML provides several often-cited advantages:
Availability of standardized parser and output libraries
Human readability
Adequate mapping to Object-Oriented models for most purposes
Nothing about FlexMol prevents it from being built on top of another data-interchange format. Two of the most interesting alternatives to XML are JavaScript Object Notation (JSON) and YAML. JSON in particular seems to have learned from XML's experiences and so represents a platform worthy of serious consideration.
What About Chemical Markup Language?
Chemical Markup Language (CML) is a widely-used XML-based molecular language. So why invent yet another XML language for chemistry? Currently, CML does not solve the molecular representation problems discussed in this article and those preceding it. So although FlexMol and CML are both built on XML, they are nevertheless each aimed at addressing different problems. In this respect, FlexMol and CML are complementary.
Where's the Software?
Any language needs software to make it useful. To simplify the use of FlexMol, it is fully supported by Octet, an Open Source framework written in Java. Supporting FlexMol in other cheminformatics toolkits will likely be challenging due to impedance mismatch; FlexMol can precisely encode a variety of structural concepts that simply don't exist elsewhere.
Conclusions
Existing molecular languages lack the expressive power to represent many structural motifs in widespread use by today's chemists. FlexMol was designed to solve this problem. Future articles in this series will demonstrate how FlexMol documents can be read and written, as well as showing some techniques for manipulating the resulting molecular representations.
Source Code Documentation in Ruby: RDoc for Ruby CDK
Good source code documentation tools can greatly enhance developer productivity at almost no cost. By eliminating the need to dig through source code files for documentation, tools like Java's Javadoc are invaluable for quickly seeing the big picture of a new piece of software. They are also handy as a quick reminder for more familiar APIs.
Ruby comes complete with a source code documentation tool called "RDoc." Running RDoc over your Ruby source produces a group of cross-linked HTML files in a manner analogous to Javadoc.
The Ruby CDK distribution contains HTML API documentation generated by RDoc and located in the doc subdirectory. This documentation was automatically generated with a task written for Ruby's build system, Rake. This task, contained in the file Rakefile, is shown below:
Rake::RDocTask.new do |rdoc|
rdoc.rdoc_dir = 'doc'
rdoc.title = "Ruby CDK"
rdoc.rdoc_files.include('README')
rdoc.options << '--line-numbers'
rdoc.options << '--inline-source'
rdoc.options << '--main' << 'README'
rdoc.rdoc_files.include('lib/**/*.rb')
endUnlike Java's build tool, Ant, Rake isn't based on XML. Instead, the entire build system is pure Ruby. This introduces a level of clarity and brevity into the build process that simply doesn't exist with Ant.
Ruby CDK's RDoc task extends the built-in Rake task named appropriately enough RDocTask. The customization of this task lies completely within the constructor block. Among other modifications, this block activates inlined source code with line numbers, a very handy feature given Ruby's highly dynamic nature.
As a convenience, Ruby CDK's RDoc documentation is now available online. The API and schema documentation for my other software projects will appear soon on Depth-First in the sidebar to the right.


