<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag canonicalization</title>
    <link>http://depth-first.com/articles/tag/canonicalization</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Update: InChI Canonicalization Algorithm</title>
      <description>&lt;p&gt;An &lt;a href="http://depth-first.com/articles/2006/08/12/inchi-canonicalization-algorithm"&gt;older article&lt;/a&gt; on the InChI canonicalization algorithm has been restored and updated. The revised article contains a direct link to the &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=142870&amp;amp;package_id=217748"&gt;InChI Technical Manual&lt;/a&gt; pdf file which I uploaded to SourceForge for convenience.&lt;/p&gt;</description>
      <pubDate>Sat, 05 May 2007 12:34:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:68c50ada-2ea4-414a-817d-db93862a71c5</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/05/05/update-inchi-canonicalization-algorithm</link>
      <category>Tools</category>
      <category>inchi</category>
      <category>canonicalization</category>
    </item>
    <item>
      <title>Creating Canonical SMILES with Ruby Open Babel</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.sf.net"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right" border="0"&gt;&lt;/a&gt;&lt;/img&gt;Unlike many data types, molecular structure representations are not normally unique. Each numbering system you choose for the atoms and bonds of a molecule gives rise to completely accurate, but degenerate molecular representations. This is one of the fundamental &lt;a href="http://depth-first.com/articles/2006/09/03/peculiarities-of-chemical-information"&gt;peculiarities of chemical information&lt;/a&gt; - and the focus of much research activity &lt;a href="http://depth-first.com/articles/2007/03/14/eleven-qualities-of-the-perfect-line-notation-for-the-web"&gt;over the last sixty or so years&lt;/a&gt;. One of the most widely-used approaches to this problem is canonicalization.&lt;/p&gt;

&lt;p&gt;This article discusses the &lt;a href="http://sourceforge.net/forum/forum.php?forum_id=629764"&gt;SMILES canonicalization capability&lt;/a&gt; in the upcoming Open Babel 2.1 release. Among several other enhancements, this release will also feature a brand new Ruby interface. By way of preview, this article will demonstrate just how convenient it has now become to generate canonical SMILES strings with Ruby.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070403/aminopterin.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Consider the putative rodenticide aminopterin, the structure of which is shown above. Regardless of whether it turns out to be the culprit in the &lt;a href="http://www.cbsnews.com/stories/2007/03/23/national/main2600615.shtml"&gt;recent pet food poisoning case&lt;/a&gt;, it's a relatively complex molecule. And with this complexity comes many possible representations. Here's one of just hundreds, if not thousands, of possible SMILES strings for this molecule:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
Nc3nc(N)c2nc(CNc1ccc(C(=O)N[C@@H](CCC(=O)O)C(=O)O)cc1)cnc2n3
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;If you were developing a database of molecules and needed to support exact structure searching, how would you do it? One way would be to convert a query molecule to a canonical SMILES string, and then simply look for that string in an index of your database's canonical SMILES. This is useful because it allows us to convert a chemistry-specific problem (exact structure search) into a generic computer science problem (text matching).&lt;/p&gt;

&lt;p&gt;We can create a simple Ruby library to convert any SMILES string into an Open Babel canonical SMILES string:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;openbabel&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Can&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@conversion&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBConversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@conversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;set_in_and_out_formats&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;smi&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;can&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;convert&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBMol&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="attribute"&gt;@conversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read_string&lt;/span&gt; &lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="attribute"&gt;@conversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write_string&lt;/span&gt; &lt;span class="ident"&gt;mol&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

Save this code as a file called &lt;strong&gt;can.rb&lt;/strong&gt; in your working directory. The library can then be used, for example, via interactive ruby (irb):

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'can'
=&gt; true
irb(main):002:0&gt; c=Can.new
=&gt; #&lt;Can:0x2ac6cc653228 @conversion=#&lt;OpenBabel::Conversion:0x2ac6cc6531d8&gt;&gt;
irb(main):003:0&gt; puts c.convert('Nc3nc(N)c2nc(CNc1ccc(C(=O)N[C@@H](CCC(=O)O)C(=O)O)cc1)cnc2n3')
OC(=O)CC[C@@H](NC(=O)c1ccc(NCc2cnc3nc(N)nc(N)c3n2)cc1)C(=O)O
=&gt; nil
irb(main):004:0&gt; puts c.convert('C1=CC(=CC=C1C(=O)N[C@@H](CCC(=O)O)C(=O)O)NCC2=CN=C3C(=N2)C(=NC(=N3)N)N')
OC(=O)CC[C@@H](NC(=O)c1ccc(NCc2cnc3nc(N)nc(N)c3n2)cc1)C(=O)O
=&gt; nil
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As you can see, both SMILES strings for aminopterin were converted into the same canonical SMILES string.&lt;/p&gt;

&lt;p&gt;Unlike InChI, which uses a "standard" &lt;a href="http://depth-first.com/articles/2006/08/12/inchi-canonicalization-algorithm"&gt;canonicalization algorithm&lt;/a&gt;, SMILES canonicalization varies by software package. As a result, the SMILES canonicalization described here will be most useful &lt;em&gt;within&lt;/em&gt; a software package, but probably not &lt;em&gt;externally&lt;/em&gt; to it, at least initially.&lt;/p&gt;

&lt;p&gt;Ruby is still an upstart language in cheminformatics. But tools like &lt;a href="http://depth-first.com/articles/tag/rubycdk"&gt;Ruby CDK&lt;/a&gt; and Ruby Open Babel offer ample opportunities for learning what this remarkable language can do for the development of chemistry applications.&lt;/p&gt;</description>
      <pubDate>Tue, 03 Apr 2007 11:59:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:53ca2aed-221a-4d52-bbb9-324fedce78d8</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/04/03/creating-canonical-smiles-with-ruby-open-babel</link>
      <category>Tools</category>
      <category>openbabel</category>
      <category>ruby</category>
      <category>rubyopenbabel</category>
      <category>smiles</category>
      <category>canonicalization</category>
    </item>
    <item>
      <title>InChI Canonicalization Algorithm</title>
      <description>&lt;p&gt;The InChI canonicalization algorithm uniquely numbers the atoms of a molecule. To date, the only implementation is found in the C source code of the &lt;a href="http://www.iupac.org/inchi/"&gt;InChI software&lt;/a&gt;. To enable new InChI implementations, for example in other programming languges, the complete canonicalization procedure is needed. Although it has not been published formally, the information exists in &lt;a href="http://sourceforge.net/mailarchive/message.php?msg_id=5.1.1.5.2.20050708111329.02502190%40email.nist.gov"&gt;two messages&lt;/a&gt; posted to the inchi-discuss mailing list by Dmitrii Tchekhovskoi. To make this information more accessible, these messages have been compiled and re-formatted. The resulting document applies to v1.0 of the IUPAC InChI software. The following article refers to the &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=142870&amp;amp;package_id=217748"&gt;&lt;em&gt;InChI Technical Manual&lt;/em&gt;&lt;/a&gt;, which can be downloaded from SourceForge.&lt;/p&gt;

&lt;h4&gt;Background&lt;/h4&gt;

&lt;p&gt;Below is a general brief description of the InChI canonicalization algorithm. I did not
dare to include canonicalization steps involved in the treatment of mobile hydrogens;
included is only a brief Note on how it is implemented.&lt;/p&gt;

&lt;p&gt;The order of minimization may be found in the section IV.e Canonicalization, InChI Tech. Of
course all details are in the code.&lt;/p&gt;

&lt;p&gt;The minimization itself is a highly technical and boring issue; an essential part of it is
built on the well-known B. D. McKay algorithm (ref. 5 in the InChI Tech. Man.) which itself
is not an easy to read text. Given our limited resources, we decided to postpone the
detailed documentation of the canonicalization. However, the InChI code is freely available
and the references to the corresponding variables in the code are given in the Figure 30,
InChI Tech. Man.&lt;/p&gt;

&lt;p&gt;Here is a very brief description (leaving aside mobile H treatment and almost all technical
details; almost all numerical examples below refer to 2-chlorobutane).&lt;/p&gt;

&lt;h4&gt;Major Step A: hydrogenless constitution.&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The atoms (ignoring terminal hydrogens) that are vertices in the "molecular graph" are
given numerical "colors" in this order of precedence:
(a) Ordering number in the sequence: C, other atoms in alphabetic order, bridging H. In
case of C4H9Cl all C will be given 1, Cl will be given 2
(b) Number of connections (number of bonds). In 2-chlorobutane CH3CH2CH(Cl)CH3 these are
(in brackets): C[1]C[2]C&lt;a href="Cl[1]"&gt;3&lt;/a&gt;C[1]
The resultant "ordered lists of colors" presented in order of the atoms in the
semistructural formula CH3CH2CH(Cl)CH3 are:
C:  1, 1
C:  1, 2
C:  1, 3
Cl: 2, 1
C   1, 1&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Atoms are assigned new colors according to lexicographical comparison of the "color
lists", in ascending order
[for example, (1,1) &amp;lt; (1,2) &amp;lt; (2,1); (1, 2) &amp;lt; (1, 2, 1)]&lt;/p&gt;

&lt;p&gt;C:  1, 1 =&gt; 2
C:  1, 2 =&gt; 3
C:  1, 3 =&gt; 4
Cl: 2, 1 =&gt; 5
C   1, 1 =&gt; 2&lt;/p&gt;

&lt;p&gt;You may notice here an unimportant detail: each color is equal to the number of atoms that
have this or smaller color.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Atoms are assigned new "ordered lists of colors": the first in the list is the color of
the atom, the rest are sorted in ascending order colors of other atoms, connected to this
atom:&lt;/p&gt;

&lt;p&gt;C:  2, 3
C:  3, 2, 3
C:  4, 2, 3, 5
Cl: 5, 4
C   2, 4&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Atoms are assigned new colors according to lexicographical comparison of the "color
lists", in ascending order&lt;/p&gt;

&lt;p&gt;C:  2, 3 =&gt; 1
C:  3, 2, 3 =&gt; 3
C:  4, 2, 3, 5 =&gt; 4
Cl: 5, 4 =&gt; 5
C   2, 4 =&gt; 2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Steps 3-4 are repeated until all new colors are different or no more changes occur (for
2-chlorobutane the colors - canonical numbers - have already been found)&lt;/p&gt;

&lt;p&gt;The resultant colors produce a so called equitable partition, in a way which is
conceptually almost same as the intermediate result of the SMILES-2 algorithm &lt;a href="http://dx.doi.org/10.1021/ci00062a008"&gt;"SMILES. 2.
Algorithm for Generation of Unique SMILES Notation" by  Weininger, D.; Weininger, A.;
Weininger, J.L.; JCICS Vol. 29, pp. 97-101, 1989&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If some of the colors are still identical, then the smallest is picked up and reduced to
the previous color + 1. For example, if colors are (this example does not refer to
2-chlorobutane):
1,2,5,5,5,7,7
then the smallest duplicated color is 5, the previous color is 2. A color of one of the
colored-5-atoms will be reduced from 5 to 2+1=3.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Repeat steps 3-6 until all colors become different (this is almost same as obtaining the
final result of the SMILES-2 algorithm) and save the "connection table". To make the reading
easier, the process of obtaining this table (actually, a list of number) is split into 3
steps. 
(a)The connection table is made out of segments, ordered in ascending order of the color of
the first atom in a segment. The number of the segments is the number of atoms. Each segment
starts with the color of an atom and is followed by a colon and a sorted list of the colors
of atoms, connected to it:
1:3; 2:4; 3:1,4; 4:2,3,5; 5:4;
(b) Since this connection table contains each connection 2 times (for example, the bond
between atoms of color 1 and 3 is in the segments "1:3" and "3:1"), it is rewritten by
excluding colors that are greater than the first color in the segment:
1; 2; 3:1; 4:2,3; 5:4;
(c) The delimiters now are redundant because the members of each segment are always smaller
than the first member of the segment. This is the final connection table to be saved and
used later:
1, 2, 3, 1, 4, 2, 3, 5, 4&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There could be a great deal of arbitrariness in choosing the atom whose color was to be
reduced at step 6 (in the example, 3 atoms have color 5; each of them could be chosen).
Therefore, repeat step 7 for all possible sequences of choosing the atoms whose color is
reduced. Lexicographically compare each obtained connection table to the previously saved
and keep the smallest one together with the assignment of the colors to the atoms. These
colors are the canonical numbers for the hydrogenless structure.
If two connection tables are identical then atoms that have same colors in two connection
tables belong to the same equivalence class; this information is saved and used. The
equivalence class is the smallest color in the equivalence group.&lt;/p&gt;

&lt;p&gt;(You may find this approach in, for example, &lt;a href="http://dx.doi.org/10.1021/ci00019a001"&gt;"A Computer-Oriented Linear Canonical
Notational System for the Representation of Organic Structures with Stereochemistry" by
Agarwal, K.K.; Gelernter, H.L.; JCICS v.34, pp.463-479, 1994&lt;/a&gt;. However, the implemented in
InChI algorithm from Ref 5 allows to avoid a combinatorial explosion in typical chemical
structures, obtain equivalence classes, and even find the order of the permutation group and
its generators)&lt;/p&gt;

&lt;p&gt;So far we got a canonical numbering (colors) for a hydrogenless structure and the canonical
equivalence classes (=the smallest color in each set of equivalent atoms).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make new colors out of the canonical equivalence classes and repeat steps 3-8 if these
colors are different from the colors previously used at Step 3. Obtain the new minimal
connection table. Use these classes as initial colors in the next steps
(If equivalence classes are, for example,
1, 1, 1, 4, 4, 5, 5, 5
then the corresponding colors are
3, 3, 3, 5, 5, 8, 8, 8)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Major Step B. Add hydrogen atoms to the structure&lt;/h4&gt;

&lt;p&gt;[A Forward Note: in the first reading you may want to skip it.
 In case of mobile H the steps are somewhat different, namely:
 (m-a)Add a list of only those H that are not mobile (similar to B.1 below) and minimize
both the connection table (it will be same) and the list.
 (m-b)Add mobile groups as pseudoatoms connected by directed edges (it means that these
pseudoatoms are not included in the connection table segments of the real atoms) to the
atoms where the mobile H and possibly negative charges may reside and canonicalize this
structure. Number of H and (-) in the groups are in one more list to minimize. The result is
the Mobile H canonical numbering and the corresponding equivalence classes, including
equivalence classes of the mobile H (and possibly negative charge) groups. Mobile groups
that have only negative charges are not included in this process.
 (m-c) Add isotopic list (see Major Step C below) to the number of lists to be minimized. Do
not include in it the exchangeable isotopic atoms H. The result of the minimization is the
Mobile H canonical numbering and equivalence classes for the isotopic structure.
 (f-a) For the fixed mobile H (FixedH option) start with the results of (m-a) and add a list
of the fixed positions of the mobile H (colors of the atoms where these H reside) and
numbers of these atoms H. The result of the minimization is the Fixed-H canonical numbering
and equivalence classes.
 (f-b) Add isotopic list (see Major Step C below). The minimization result is the Fixed-H
canonical numbering and equivalence classes for the isotopic Fixed-H structure.]&lt;/p&gt;

&lt;p&gt;Use previously obtained equivalence classes at Step A.9 and use the previously obtained
minimal connection table for the comparison. Run Steps A.3-8 with the following difference:
each time the connection tables are compared at Step A.8, in case of identical connection
tables also compare the list of terminal atoms H in the following form:
 1, number&lt;em&gt;of_H(1), 2, number&lt;/em&gt;of&lt;em&gt;H(2), ...n, number&lt;/em&gt;of_H(n)
 where  number_of_H(c) is the number of terminal atoms H attached to the atom that has color
c; n = number of atoms.
 Save the found this way minimal list of the terminal atoms together with the assignment of
the colors to the atoms. Also obtain the equivalence classes as it was done earlier.&lt;/p&gt;

&lt;p&gt;We now have the canonical colors (numbering) of the non-isotopic non-tautomeric structure.&lt;/p&gt;

&lt;h4&gt;Major Step C. Add isotopic composition to the structure&lt;/h4&gt;

&lt;p&gt;If the structure is isotopic then add one more list to compare if the connection tables
and the lists of terminal atoms H are same:
 1, iso&lt;em&gt;weight(1), 2, iso&lt;/em&gt;weight(2), ...n, iso_weight(n)&lt;/p&gt;

&lt;p&gt;where iso_weight(c) is the "isotopic weight" of the atom to which the color c was assigned.
For each atom the isotopic weight is calculated according to the formula:&lt;/p&gt;

&lt;p&gt;iso_weight=nH1 + 32&lt;em&gt;(nH2 + 32&lt;/em&gt;(nH3 + 32*shift))&lt;/p&gt;

&lt;p&gt;nH1 = number of terminal atoms of protium attached to the atom
 nH2 = number of terminal atoms of deuterium attached to the atom
 nH2 = number of terminal atoms of tritium attached to the atom
 shift = [(integral) mass of the isotopic atom] - [rounded average atomic mass]&lt;/p&gt;

&lt;p&gt;Note: hydrogen H is treated differently from it isotope protium: H has "natural" isotopic
composition while protium is treated as an isotopic atom&lt;/p&gt;

&lt;p&gt;In case of a not isotopic atom the shift = 0 by definition.
 If the atom is isotopic and its mass number is greater or equal to the rounded average
atomic mass (that is, shift is not negative) then the shift is incremented to avoid shift=0
for isotopes.&lt;/p&gt;

&lt;p&gt;If the formula produces iso_weight=0 (the atom and the attached H are not isotopic) then
iso&lt;em&gt;weight(c) is set equal to LONG&lt;/em&gt;MAX from include file limits.h (for a 32-bit systems it
is usually 2,147,483,647 - greater than any iso_weight). This forces isotopic atoms assume
the least possible canonical numbers.&lt;/p&gt;

&lt;p&gt;Repeat Major Step B, adding the list of isotopic weights to those already minimized.&lt;/p&gt;

&lt;p&gt;At this point we are finished with the modified B.D.McKay"s algorithm.&lt;/p&gt;

&lt;p&gt;It should be pointed out that for the sake of simplicity, avoiding dependence on the
hardware or operating system, and possibility to reproduce the results "by hand", the
efficiency of the original B.D.McKay"s algorithm has been reduced. The greatest impact is
due to abandoning hashing for the connection table comparison and introducing additional to
the connection table lists to be minimized. Also the implemented algorithm for calculating
the equitable partition from the given colors is less effective than the one suggested in
Ref. 5. All further improvements introduced by B.D.McKay after publishing Ref. 5 in his
famous Nauty program are not implemented in the InChI code.&lt;/p&gt;

&lt;h4&gt;Major Step D. Stereochemistry&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;For the found canonical colors (numbers) calculate double bond (&gt;X=Y&amp;lt;) and cumulene (&gt;W=X=Y=Z&amp;lt;) parities:
For each atom at the ends of the double bond or cumulene find connected to it by a single
bond atom that has larger canonical number. If these two found atoms are in "cis" positions
then the parity is (-), otherwise the parity is (+)
Save parities list:
c1[1],c2[1],p[1],c1[2],c2[2],p[2],...,c1[n1],c2[n1],p[n1]
arranged in ascending order of (c1[i],c2[i]) pairs
where
n1=number of possibly stereogenic double bonds and cumulenes
c1[i]&gt;c2[i] are colors of the atoms at the end of a double bond or cumulene
p[i] is the parity ("u" &gt; "?" &gt; "+" &gt; "-")
Order: let a1&gt;a2 and b1&gt;b2 be the colors of the atoms for two double bonds, (a1,a2) and
(b1,b2).
(a1,a2) &gt; (b1,b2) if and only if ((a1 &gt; b1) || (a1==b1 &amp;amp;&amp;amp; a1 &gt; b1))
(here the C programming language notation is used)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For each allene &gt;X=Y=Z&amp;lt; consider a tetrahedron that has as its apices the four atoms
connected by single bonds to the allene atoms X and Z. If you look at other apices from the
apex that has the smallest canonical number and see canonical numbers of these three other
apices arranged in ascending order clockwise then the parity is (+), otherwise it is (-).
Save parities list:
c[1],p[1],c[2],p[2],...,c[n2],p[n2]
arranged in ascending order of c[i]
where
n2=number of possibly stereogenic allenes
c[i] are the colors of atoms Y
p[i] are the parities&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For each possibly stereogenic atom consider a tetrahedron that has as its apices the
four atoms connected this possibly stereogenic atom. If you look at other apices from the
apex that has the smallest canonical number and see canonical numbers of these three other
apices arranged in ascending order clockwise then the parity is (+), otherwise it is (-).
Save parities list:
c[1],p[1],c[2],p[2],...,c[n3],p[n3]
arranged in ascending order of c[i]
where
n3=number of possibly stereogenic atoms
c[i] are the colors of the atoms
p[i] are the parities&lt;/p&gt;

&lt;p&gt;Note. Terminal hydrogen atoms do not have colors (canonical numbers). In parity
calculations, hydrogen atoms are assumed to have colors less than the smallest color of
other atoms, that is, less than 1. The values of their colors c are assumed to be:
c[H] &amp;lt; c[protium] &amp;lt; c[deuterium] &amp;lt; c[tritium] &amp;lt; 1
In a special case of all four connected to the same atom the atom is not stereogenic.
In case of a tetrahedral atom that has only 3 bonds (for example, &gt;S=O or &gt;N-) the
direction of the lone electron pair is used as one more bond;
c[lone pair] &amp;lt; c[H].&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Repeat steps 1-3 for all other mappings of the canonical numbers on the atoms that
produce same results as in Major Step B or C and find the mapping(s) that produce the
lexicographically smallest result in this order of the lists: D.1, D.2, D.3.
To each result apply a heuristic to detect possibly stereogenic elements that in reality
are not stereogenic; if such elements have been found then remove their parities and repeat
D.1-4.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Repeat steps 1-4 for the spatially inverted structure. Accept the one that has smaller
stereo (D.1 stereo should be same). Set "inverted" flag if the inverted stereo was selected.&lt;/p&gt;

&lt;p&gt;This description refers to a single component.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;</description>
      <pubDate>Sat, 12 Aug 2006 17:38:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:b703aac3-cfcf-4565-af96-68ce2988348f</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/08/12/inchi-canonicalization-algorithm</link>
      <category>Tools</category>
      <category>inchi</category>
      <category>canonicalization</category>
    </item>
  </channel>
</rss>
