A Molecular Language for Modern Chemistry: FlexMol and Axial Chirality

Posted by Rich Apodaca Tue, 09 Jan 2007 20:50:00 GMT

A recent article introduced FlexMol as a molecular language with the unique capability of encoding axial chirality. A previous article showed how E/Z geometrical isomerism is encoded with FlexMol. Using the popular chiral reagent and ligand 1,1'-bi-2-naphthol (BINOL) as an example, this tutorial will illustrate in detail how axial chirality is encoded in FlexMol.

Configuration or Conformation?

In contrast to configurational stereoisomers, conformational stereoisomers can be interconverted through bond rotations. So we'll need to use a conformationWheel to represent stereochemistry in BINOL - just as we did with 2-butene. For more rigorous definitions of these concepts, see the original specification by Dietz.

(R)-BINOL

A FlexMol representation and associated atom numbering scheme (R)-BINOL are show below:

<!-- (R)-BINOL -->
<?xml version="1.0" standalone="yes"?>

<molecule>
  <constitution>
    <atoms>
      <atom id="C0" symbol="C" hydrogens="0" ionization="4"></atom>
      <atom id="C1" symbol="C" hydrogens="0" ionization="4"></atom>
      <atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C5" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C6" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C7" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C8" symbol="C" hydrogens="0" ionization="4"></atom>
      <atom id="C9" symbol="C" hydrogens="0" ionization="4"></atom>
      <atom id="C10" symbol="C" hydrogens="0" ionization="4"></atom>
      <atom id="C11" symbol="C" hydrogens="0" ionization="4"></atom>
      <atom id="C12" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C13" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C14" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C15" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C16" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C17" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C18" symbol="C" hydrogens="0" ionization="4"></atom>
      <atom id="C19" symbol="C" hydrogens="0" ionization="4"></atom>
      <atom id="O20" symbol="O" hydrogens="1" ionization="2"></atom>
      <atom id="O22" symbol="O" hydrogens="1" ionization="2"></atom>
    </atoms>
    <bonding>
      <bond source="C0" target="C1" bondingElectrons="2"></bond>
      <bond source="C1" target="C2" bondingElectrons="2"></bond>
      <bond source="C2" target="C3" bondingElectrons="2"></bond>
      <bond source="C3" target="C4" bondingElectrons="2"></bond>
      <bond source="C4" target="C5" bondingElectrons="2"></bond>
      <bond source="C0" target="C5" bondingElectrons="2"></bond>
      <bond source="C0" target="C6" bondingElectrons="2"></bond>
      <bond source="C6" target="C7" bondingElectrons="2"></bond>
      <bond source="C7" target="C8" bondingElectrons="2"></bond>
      <bond source="C8" target="C9" bondingElectrons="2"></bond>
      <bond source="C9" target="C1" bondingElectrons="2"></bond>
      <bondingSystem bondingElectrons="10">
        <connections>
          <atomPair source="C0" target="C1"></atomPair>
          <atomPair source="C1" target="C2"></atomPair>
          <atomPair source="C2" target="C3"></atomPair>
          <atomPair source="C3" target="C4"></atomPair>
          <atomPair source="C4" target="C5"></atomPair>
          <atomPair source="C0" target="C5"></atomPair>
          <atomPair source="C0" target="C6"></atomPair>
          <atomPair source="C6" target="C7"></atomPair>
          <atomPair source="C7" target="C8"></atomPair>
          <atomPair source="C8" target="C9"></atomPair>
          <atomPair source="C9" target="C1"></atomPair>
        </connections>
      </bondingSystem>
      <bond source="C10" target="C11" bondingElectrons="2"></bond>
      <bond source="C11" target="C12" bondingElectrons="2"></bond>
      <bond source="C12" target="C13" bondingElectrons="2"></bond>
      <bond source="C13" target="C14" bondingElectrons="2"></bond>
      <bond source="C14" target="C15" bondingElectrons="2"></bond>
      <bond source="C10" target="C15" bondingElectrons="2"></bond>
      <bond source="C10" target="C16" bondingElectrons="2"></bond>
      <bond source="C16" target="C17" bondingElectrons="2"></bond>
      <bond source="C17" target="C18" bondingElectrons="2"></bond>
      <bond source="C18" target="C19" bondingElectrons="2"></bond>
      <bond source="C19" target="C11" bondingElectrons="2"></bond>
      <bondingSystem bondingElectrons="10">
        <connections>
          <atomPair source="C10" target="C11"></atomPair>
          <atomPair source="C11" target="C12"></atomPair>
          <atomPair source="C12" target="C13"></atomPair>
          <atomPair source="C13" target="C14"></atomPair>
          <atomPair source="C14" target="C15"></atomPair>
          <atomPair source="C10" target="C15"></atomPair>
          <atomPair source="C10" target="C16"></atomPair>
          <atomPair source="C16" target="C17"></atomPair>
          <atomPair source="C17" target="C18"></atomPair>
          <atomPair source="C18" target="C19"></atomPair>
          <atomPair source="C19" target="C11"></atomPair>
        </connections>
      </bondingSystem>
      <bond source="C9" target="C19" bondingElectrons="2"></bond>
      <bond source="C8" target="O20" bondingElectron="2"></bond>
      <bond source="C18" target="O21" bondingElectron="2"></bond>
    </bonding>
  </constitution>
  <conformation>
    <conformationWheel>
      <gammaSequence source="C19" target="C9">
        <connections>
          <atomPair source="C9" target="C19"></atomPair>
        </connections>
      </gammaSequence>
      <halfPlane>
        <lower atom="C11"></lower>
      </halfPlane>
      <halfPlane>
        <upper atom="C1"></upper>
      </halfPlane>
      <halfPlane>
        <lower atom="C18"></lower>
      </halfPlane>
      <halfPlane>
        <upper atom="C8"></upper>
      </halfPlane>
    </conformationWheel>
  </conformation>
</molecule>

We've elected to represent BINOL's two pi-systems as ten-atom, ten-electron bondingSystems. We could have just as easily represented each naphthalene ring using alternating single/double bonds containing two and four electrons, respectively. For an explanation of multi-atom pi-system bonding in FlexMol, see this article.

The stereochemically-relevant part of this document is contained within the conformation element. A gammaSequence, or conformational axis, is defined along with four non-empty halfPlanes. Notice how the basic structure of this conformation element closely resembles the one for 2-butene.

To better visualize the the conformation element of (R)-BINOL, consider the following diagram:

The conformationWheel defines a conformational axis vector from atom C19 to atom C9. Arranged about this axis in a clockwise fashion are four non-empty halfPlanes. Picking an arbitrary halfPlane to start with, atom C11 is positioned first in the lower half. This is then followed by the next halfPlane, which contains atom C1 in its upper half. The next halfPlane contains atom C18 in the lower half. Finally, atom C8 is located in the last halfPlane's upper half.

This procedure completely specifies the axial chirality of (R)-BINOL. Notice how no arbitrary stereodescriptors or chiral templates were used. Of course, we could derive the Cahn-Ingold-Prelog stereodescriptor of (R), given the right software.

Many representations of the same chiral axis are possible, just as each connection table can be represented in many different ways. For example, we could have started the conformation element with the halfPlane containing atom C1. In this case, the ordering of atoms would be C1, C18, C8, C11. Similarly, the orientation of our chiral axis could have been defined from atom C9 to atom C19. In this case the ordering of halfPlanes would be reversed, and the upper/lower designations would be inverted.

(S)-BINOL

How is (S)-BINOL encoded in FlexMol? As you might expect, completely analogously to the (R) enantiomer:

<!-- snip -->
<conformation>
  <conformationWheel>
    <gammaSequence source="C19" target="C9">
      <connections>
        <atomPair source="C9" target="C19"></atomPair>
      </connections>
    </gammaSequence>
    <halfPlane>
      <lower atom="C11"></lower>
    </halfPlane>
    <halfPlane>
      <upper atom="C8"></upper>
    </halfPlane>
    <halfPlane>
      <lower atom="C18"></lower>
    </halfPlane>
    <halfPlane>
      <upper atom="C1"></upper>
    </halfPlane>
  </conformationWheel>
</conformation>
<!-- snip -->

As with (R)-BINOL, we can create a diagram representing the conformationWheel of (S)-BINOL:

Conclusions

As you can see, FlexMol completely encodes axial chirality using just a few basic XML elements, rather than chiral templates or stereodescriptors. These were, in fact, the same elements used to encode alkene geometrical isomerism. This modular approach to stereoisomerism results in an extensible system. Future articles will discuss other forms of stereoisomerism that can be represented in FlexMol, including the all-important tetrahedral stereogenic center.

A Molecular Language for Modern Chemistry: Getting Started with FlexMol

Posted by Rich Apodaca Wed, 20 Dec 2006 20:16:00 GMT

Existing molecular languages are limited in their ability to represent such commonplace features as multi-center bonding and axial chirality. The practical outcome of these limitations can be seen in PubChem's four separate entries for ferrocene and the inability to fully represent many molecules now in common use by organic chemists.

A recent article touched on a molecular representation system that was capable of far greater expressive power than those currently in use. In this article, I'll introduce FlexMol, an XML implementation of this advanced molecular representation system.

What is FlexMol?

FlexMol is an XML-based molecular language that's designed to allow the faithful representation of any molecule, regardless of its peculiarities. The following is a list of features that FlexMol can encode:

  • Multi-atom, multi-electron bonds

  • All known forms of stereochemistry, including axial chirality (e.g., allenes and biarlys), planar chirality (e.g., metallocenes), and non-tetrahedral stereocenters (e.g., square planar and octahedral metal complexes)

  • Non-natural isotopic distributions and pure isotopes

  • Virtual hydrogens (similar to "implicit hydrogens") through mandatory, explicit enumeration

  • Electronic spin, enabling the differentiation of spin states

What Does FlexMol Look Like?

Let's start with the simple example of benzene:

<!-- Benzene, represented as "1,3,5-cyclohexatriene" -->
<?xml version="1.0" standalone="yes"?>

<molecule>
  <constitution>
    <atoms>
      <atom id="C0" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C1" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C5" symbol="C" hydrogens="1" ionization="4"></atom>
    </atoms>
    <bonding>
      <bond source="C0" target="C1" bondingElectrons="2"></bond>
      <bond source="C1" target="C2" bondingElectrons="4"></bond>
      <bond source="C2" target="C3" bondingElectrons="2"></bond>
      <bond source="C3" target="C4" bondingElectrons="4"></bond>
      <bond source="C4" target="C5" bondingElectrons="2"></bond>
      <bond source="C0" target="C5" bondingElectrons="4"></bond>
    </bonding>
  </constitution>
</molecule>

The above representation divides the structure of benzene into two main elements - atoms and bonding. Both of these elements are in turn subelements of the constitution element, which specifies atom connectivity. Had we been representing a molecule with stereochemical features, the above document could have also contained a configuration element, a conformation element, or both.

Within the atoms element are definitions for each of the six degenerate carbon atoms of benzene. Each atom is assigned a unique ID for use elsewhere in the document, an atomic symbol, the number of hydrogens bonded to each atom, and the effective ionization state of each atom. The mandatory hydrogens attribute specifies "virtual" hydrogens, or those associated with an atom without being full-fledged nodes in the graph representation.

The bonding element defines all of the bonding arrangements within benzene. In this case, benzene is being represented as "cyclohexatriene" with alternating single and double bonds; below we'll see how to use FlexMol to represent delocalized (aromatic) bonding. Each bond specifies a source atom, a target atom, and the number of bonding electrons.

In many situations, the above representation of benzene will not suffice. What if we want to describe the one-electron ionization of benzene to form the benzene radical cation? Using the "cyclohexatriene" form of benzene makes it impossible to select the correct bond from which to take electrons.

Instead, we could use a more physically meaningful representation of benzene, such as that shown below:

<!-- Benzene, represented with a delocalized pi-system -->
<?xml version="1.0" standalone="yes"?>

<molecule>
  <constitution>
    <atoms>
      <atom id="C0" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C1" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C5" symbol="C" hydrogens="1" ionization="4"></atom>
    </atoms>
    <bonding>
      <bond source="C0" target="C1" bondingElectrons="2"></bond>
      <bond source="C1" target="C2" bondingElectrons="2"></bond>
      <bond source="C2" target="C3" bondingElectrons="2"></bond>
      <bond source="C3" target="C4" bondingElectrons="2"></bond>
      <bond source="C4" target="C5" bondingElectrons="2"></bond>
      <bond source="C0" target="C5" bondingElectrons="2"></bond>
      <bondingSystem bondingElectrons="6">
        <connections>
          <atomPair source="C0" target="C1"></atomPair>
          <atomPair source="C1" target="C2"></atomPair>
          <atomPair source="C2" target="C3"></atomPair>
          <atomPair source="C3" target="C4"></atomPair>
          <atomPair source="C4" target="C5"></atomPair>
          <atomPair source="C0" target="C5"></atomPair>
        </connections>
      </bondingSystem>
    </bonding>
  </constitution>
</molecule>

This is certainly more verbose, but what does it buy us? Notice the bondingSystem subelement at the end of the bonding element. Here we define an extended six-atom, six-electron bonding system that much more closely reflects the true nature of benzene's pi-system. Now it's obvious that this is the bonding motif from which to take an electron to make the benzene radical cation.

Next, consider the cyclopenadienyl anion, which possesses a five-atom, six-electron Hueckel aromatic bonding system. We can apply the same principles in representing benzene's pi-system to the representation of the cyclopentadienyl anion's pi-bonding:

<!-- Cyclopentadienyl Anion -->
<?xml version="1.0" standalone="yes"?>

<molecule>
  <constitution>
    <atoms>
      <atom id="C0" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C1" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
    </atoms>
    <bonding>
      <bond source="C0" target="C1" bondingElectrons="2"></bond>
      <bond source="C1" target="C2" bondingElectrons="2"></bond>
      <bond source="C2" target="C3" bondingElectrons="2"></bond>
      <bond source="C3" target="C4" bondingElectrons="2"></bond>
      <bond source="C0" target="C4" bondingElectrons="2"></bond>
      <bondingSystem bondingElectrons="6">
        <connections>
          <atomPair source="C0" target="C1"></atomPair>
          <atomPair source="C1" target="C2"></atomPair>
          <atomPair source="C2" target="C3"></atomPair>
          <atomPair source="C3" target="C4"></atomPair>
          <atomPair source="C0" target="C4"></atomPair>
        </connections>
      </bondingSystem>
    </bonding>
  </constitution>
</molecule>

In the above representation, all carbon atoms are equivalent - something difficult, if not impossible, to achieve with most other molecular languages. Furthermore, the representation of delocalized bonding closely matches what most chemists would describe. We could get even more sophisticated and place individual electrons into three separate bonding systems in analogy with molecular orbitals - it really depends on what we'd like to emphasize.

This is well and good for aromaticity, but how can all of this help solve the Ferrocene Problem? Just as with cyclopentadienyl anion and benzene, in the representation of ferrocene below, we're taking advantage of FlexMol's support for multi-atom bonding. In this case, we define three bondingSystems, each of which contain six electrons. We could have just as easily created a single eighteen-electron, eleven-atom bonding system. Our choice of representation again depends on what we're trying to emphasize.

<!-- Ferrocene -->
<?xml version="1.0" standalone="yes"?>

<molecule>
  <constitution>
    <atoms>
      <atom id="C0" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C1" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C2" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C3" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C4" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C5" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C6" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C7" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C8" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="C9" symbol="C" hydrogens="1" ionization="4"></atom>
      <atom id="Fe10" symbol="Fe" hydrogens="0" ionization="8"></atom>
    </atoms>
    <bonding>
      <bond source="C0" target="C1" bondingElectrons="2"></bond>
      <bond source="C1" target="C2" bondingElectrons="2"></bond>
      <bond source="C2" target="C3" bondingElectrons="2"></bond>
      <bond source="C3" target="C4" bondingElectrons="2"></bond>
      <bond source="C0" target="C4" bondingElectrons="2"></bond>
      <bond source="C5" target="C6" bondingElectrons="2"></bond>
      <bond source="C6" target="C7" bondingElectrons="2"></bond>
      <bond source="C7" target="C8" bondingElectrons="2"></bond>
      <bond source="C8" target="C9" bondingElectrons="2"></bond>
      <bond source="C5" target="C9" bondingElectrons="2"></bond>
      <bondingSystem bondingElectrons="6">
        <connections>
          <atomPair source="C0" target="C1"></atomPair>
          <atomPair source="C1" target="C2"></atomPair>
          <atomPair source="C2" target="C3"></atomPair>
          <atomPair source="C3" target="C4"></atomPair>
          <atomPair source="C0" target="C4"></atomPair>
          <atomPair source="C0" target="Fe10"></atomPair>
          <atomPair source="C1" target="Fe10"></atomPair>
          <atomPair source="C2" target="Fe10"></atomPair>
          <atomPair source="C3" target="Fe10"></atomPair>
          <atomPair source="C4" target="Fe10"></atomPair>
        </connections>
      </bondingSystem>
      <bondingSystem bondingElectrons="6">
        <connections>
          <atomPair source="C5" target="C6"></atomPair>
          <atomPair source="C6" target="C7"></atomPair>
          <atomPair source="C7" target="C8"></atomPair>
          <atomPair source="C8" target="C9"></atomPair>
          <atomPair source="C5" target="C9"></atomPair>
          <atomPair source="C5" target="Fe10"></atomPair>
          <atomPair source="C6" target="Fe10"></atomPair>
          <atomPair source="C7" target="Fe10"></atomPair>
          <atomPair source="C8" target="Fe10"></atomPair>
          <atomPair source="C9" target="Fe10"></atomPair>
        </connections>
      </bondingSystem>
      <bondingSystem bondingElectrons="6">
        <connections>
          <atomPair source="C0" target="C1"></atomPair>
          <atomPair source="C1" target="C2"></atomPair>
          <atomPair source="C2" target="C3"></atomPair>
          <atomPair source="C3" target="C4"></atomPair>
          <atomPair source="C0" target="C4"></atomPair>
          <atomPair source="C0" target="Fe10"></atomPair>
          <atomPair source="C1" target="Fe10"></atomPair>
          <atomPair source="C2" target="Fe10"></atomPair>
          <atomPair source="C3" target="Fe10"></atomPair>
          <atomPair source="C4" target="Fe10"></atomPair>
          <atomPair source="C5" target="C6"></atomPair>
          <atomPair source="C6" target="C7"></atomPair>
          <atomPair source="C7" target="C8"></atomPair>
          <atomPair source="C8" target="C9"></atomPair>
          <atomPair source="C5" target="C9"></atomPair>
          <atomPair source="C5" target="Fe10"></atomPair>
          <atomPair source="C6" target="Fe10"></atomPair>
          <atomPair source="C7" target="Fe10"></atomPair>
          <atomPair source="C8" target="Fe10"></atomPair>
          <atomPair source="C9" target="Fe10"></atomPair>
        </connections>
      </bondingSystem>
    </bonding>
  </constitution>
</molecule>

The same principles outlined for ferrocene apply equally to other metallocenes. FlexMol can also represent a host of otherwise tough cases such as nonclassical carbocations, allylmetal complexes, resonance-stabilized radicals and ions, and transition states.

Why XML?

XML provides several often-cited advantages:

  • Availability of standardized parser and output libraries

  • Human readability

  • Adequate mapping to Object-Oriented models for most purposes

Nothing about FlexMol prevents it from being built on top of another data-interchange format. Two of the most interesting alternatives to XML are JavaScript Object Notation (JSON) and YAML. JSON in particular seems to have learned from XML's experiences and so represents a platform worthy of serious consideration.

What About Chemical Markup Language?

Chemical Markup Language (CML) is a widely-used XML-based molecular language. So why invent yet another XML language for chemistry? Currently, CML does not solve the molecular representation problems discussed in this article and those preceding it. So although FlexMol and CML are both built on XML, they are nevertheless each aimed at addressing different problems. In this respect, FlexMol and CML are complementary.

Where's the Software?

Any language needs software to make it useful. To simplify the use of FlexMol, it is fully supported by Octet, an Open Source framework written in Java. Supporting FlexMol in other cheminformatics toolkits will likely be challenging due to impedance mismatch; FlexMol can precisely encode a variety of structural concepts that simply don't exist elsewhere.

Conclusions

Existing molecular languages lack the expressive power to represent many structural motifs in widespread use by today's chemists. FlexMol was designed to solve this problem. Future articles in this series will demonstrate how FlexMol documents can be read and written, as well as showing some techniques for manipulating the resulting molecular representations.

Older posts: 1 2