<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag smiles</title>
    <link>http://depth-first.com/articles/tag/smiles</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Science Blogging Anthology Now in Print</title>
      <description>&lt;p&gt;&lt;a href="http://www.lulu.com/content/1869828"&gt;&lt;img src="http://depth-first.com/demo/20080115/openlab.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The science blogging anthology &lt;em&gt;The Open Laboratory 2007&lt;/em&gt; is now &lt;a href="http://www.lulu.com/content/1869828"&gt;available for purchase&lt;/a&gt;. As &lt;a href="http://depth-first.com/articles/2008/01/07/depth-first-article-to-appear-in-science-blogging-anthology"&gt;mentioned earlier&lt;/a&gt;, &lt;em&gt;The Open Laboratory&lt;/em&gt; was created to promote the &lt;a href="http://wiki.scienceblogging.com/scienceblogging/"&gt;2008 North Carolina Science Blogging Conference&lt;/a&gt; to be held on January 19, 2008. Chapter 4.3 contains the article &lt;a href="http://depth-first.com/articles/2007/11/28/smiles-and-aromaticity-broken"&gt;"SMILES and Aromaticity: Broken?"&lt;/a&gt;, which originally appeared last year on Depth-First. Details are available in &lt;a href="http://scienceblogs.com/clock/2008/01/open_lab_2007_up_for_sale.php"&gt;the original announcement&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The Open Laboratory's&lt;/em&gt; publisher is remarkable. &lt;a href="http://www.lulu.com/"&gt;Lulu&lt;/a&gt; is a service that lets people of average means publish and sell their own books. The key to the entire operation is that rather than being printed in large batches, books are printed on demand.&lt;/p&gt;

&lt;p&gt;Got a great idea for a book that will likely have a devoted but small audience? You too can publish a high-quality product and sell it through an established, worldwide distribution network. No contracts, no agents, no years of trying to find a publisher. Just do it.&lt;/p&gt;

&lt;p&gt;Consider these chemistry-related titles currently offered by Lulu, none of which has the mass market needed to get a major publisher to back them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.lulu.com/content/913076"&gt;&lt;em&gt;Basic Meat Chemistry&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.lulu.com/content/216421"&gt;&lt;em&gt;The Psychonomicon&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.lulu.com/content/1003569"&gt;&lt;em&gt;The Chemistry of Autumn Colors&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.lulu.com/content/66582"&gt;&lt;em&gt;The Chemy Called Al&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Having bought one Lulu title recently, &lt;a href="http://www.lulu.com/content/120769"&gt;&lt;em&gt;Desktop Java Live&lt;/em&gt;&lt;/a&gt;, I can say that both the experience and finished product are nearly indistinguishable from buying books at Amazon.&lt;/p&gt;

&lt;p&gt;Let's hear it for &lt;a href="http://www.thelongtail.com/"&gt;The Long Tail&lt;/a&gt;!&lt;/p&gt;</description>
      <pubDate>Tue, 15 Jan 2008 09:31:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:c3a95989-c4a2-468e-b421-af19e04687b7</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/01/15/science-blogging-anthology-now-in-print</link>
      <category>Meta</category>
      <category>thelongtail</category>
      <category>openlaboratory</category>
      <category>scienceblogging</category>
      <category>smiles</category>
      <category>lulu</category>
    </item>
    <item>
      <title>Depth-First Article to Appear in Science Blogging Anthology</title>
      <description>&lt;p&gt;&lt;a href="http://wiki.scienceblogging.com/scienceblogging/"&gt;&lt;img src="http://depth-first.com/demo/20080107/2008NCSBClogo200.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;A recent Depth-First article titled &lt;a href="http://depth-first.com/articles/2007/11/28/smiles-and-aromaticity-broken"&gt;"SMILES and Aromaticity: Broken?"&lt;/a&gt; has been &lt;a href="http://scienceblogs.com/clock/2008/01/open_lab_2007_the_winning_entr.php"&gt;selected to appear&lt;/a&gt; in the science blogging anthology "The Open Laboratory 2007." This article, along with the 51 other winning entries, will be published as a book that can be purchased from Amazon.com. The book, the second in a series, is aimed at promoting the &lt;a href="http://wiki.scienceblogging.com/scienceblogging/"&gt;2008 North Carolina Science Blogging Conference&lt;/a&gt; to be held on January 19, 2008.&lt;/p&gt;

&lt;p&gt;Are science blogging anthologies like Open Laboratory just a passing fad, or the beginning of something much larger? Only time will tell. What's clear is that the means of production and distribution of scientific information are getting cheaper by the year, resulting in an increasingly large range of choices for readers. If other communication-related industries such as movies, music, software, and newspapers offer any indication of what lies ahead, &lt;a href="http://sethgodin.typepad.com/seths_blog/2005/06/small_is_the_ne.html"&gt;small may well be the new big&lt;/a&gt; in scientific publication - and not a moment too soon.&lt;/p&gt;</description>
      <pubDate>Mon, 07 Jan 2008 09:10:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:b6c570ac-2b20-49c7-86c4-da78e75f2e42</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/01/07/depth-first-article-to-appear-in-science-blogging-anthology</link>
      <category>Meta</category>
      <category>scienceblogging</category>
      <category>openlaboratory</category>
      <category>smiles</category>
    </item>
    <item>
      <title>Run Babel Anywhere Java Runs with JBabel</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.sf.net"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;A &lt;a href="http://depth-first.com/articles/tag/nestedvm"&gt;recent series of D-F articles&lt;/a&gt; have discussed the use of &lt;a href="http://nestedvm.ibex.org/"&gt;NestedVM&lt;/a&gt; to compile cheminformatics programs written in C/C++ to pure java binaries that can be run on any system with a JVM. More specifically, an attempt to compile &lt;a href="http://openbabel.sf.net"&gt;OpenBabel's&lt;/a&gt; &lt;tt&gt;babel&lt;/tt&gt; program to bytecode was only &lt;a href="http://depth-first.com/articles/2007/11/26/compiling-open-babel-to-pure-java-bytecode-with-nestedvm-building-a-runnable-classfile-that-almost-works"&gt;partially successful&lt;/a&gt;. With the &lt;a href="http://sourceforge.net/mailarchive/forum.php?thread_name=819391.60947.qm%40web34201.mail.mud.yahoo.com&amp;amp;forum_name=openbabel-discuss"&gt;help of Geoff Hutchison&lt;/a&gt;, the problem was resolved. This article introduces JBabel, a platform-independent, pure Java implementation of OpenBabel's &lt;tt&gt;babel&lt;/tt&gt; program.&lt;/p&gt;

&lt;h4&gt;A Little About JBabel&lt;/h4&gt;

&lt;p&gt;JBabel was compiled from the &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=40728&amp;amp;package_id=32894&amp;amp;release_id=521581"&gt;Open Babel 2.1.1 source release&lt;/a&gt; and can be &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=144794&amp;amp;package_id=255103"&gt;downloaded from SourceForge&lt;/a&gt;. The same jarfile was successfully tested on Linux, Windows and Mac OS X. You can verify JBabel works on your platform with the following command:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -Hsmi
smi  SMILES format
A linear text format which can describe the connectivity
and chirality of a molecule
Write Options e.g. -xt
  n no molecule name
  t molecule name only
  r radicals lower case eg ethyl is Cc
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This version of JBabel was compiled with support for three formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SMILES (smi). Non-canonical SMILES.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MDL (mol). Molfiles and SD Files.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Canonical SMILES (can). Canonical SMILES implementation &lt;a href="http://depth-first.com/articles/2006/11/06/stone-soup"&gt;donated by eMolecules&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll discuss exactly how support for these formats was added in a subsequent post. More formats will be added in the future. For now, let's just try JBabel out.&lt;/p&gt;

&lt;h4&gt;Testing JBabel&lt;/h4&gt;

&lt;p&gt;One way to use JBabel is interactively from the command line - just leave out an input or output file parameter. For example, if you wanted to get the eMolecules canonical SMILES for &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=68617"&gt;sertraline&lt;/a&gt;, you might do something like this (be sure to use two returns to begin processing):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -ismi -ocan
CN[C@H]1CC[C@H](C2=CC=CC=C12)C3=CC(=C(C=C3)Cl)Cl

CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12
1 molecule converted
34 audit log messages
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This canonical SMILES can be converted into a molfile with the following:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -ismi -omol
CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12


 OpenBabel12090723182D

 22 24  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0

...
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;To convert using input and output files, we could use a medium-sized dataset such as the &lt;a href="http://rubyforge.org/frs/download.php/27768/pubchem_benzodiazepine_20071110.sdf.gz"&gt;PubChem benzodiazepine dataset&lt;/a&gt; prepared for &lt;a href="http://rbtk.rubyforge.org/"&gt;Rubidium&lt;/a&gt;:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -imol pubchem_benzodiazepine_20071110.sdf -ocan pubchem_benzodiazepine_20071110.smi
==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problems reading a MDL file
Cannot read title line

2117 molecules converted
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This test, which parses 2117 records, required four minutes forty-five seconds on my system. For comparison, the natively compiled binary did the same thing in about thirteen seconds. Clearly, the JBabel performance hit is substantial.&lt;/p&gt;

&lt;h4&gt;Uses&lt;/h4&gt;

&lt;p&gt;Although it's very unlikely that JBabel will ever be useful in performance-critical situations, its portability makes it attractive for other uses. Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;application development in heterogeneous computing environments;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;use on systems in which native compilation may be difficult, such as those with unusual configurations or operating systems;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cases in which native binaries work poorly or not at all, such as in applets and Java applications;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;situations in which performance is a minor consideration, such as in end-user applications that process only a few molecules at a time, or during application prototyping&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;This article has described JBabel, the first portable binary version of OpenBabel's &lt;tt&gt;babel&lt;/tt&gt; molecular file format interconversion program. The next article in this series will describe in detail the steps that were used to compile it.&lt;/p&gt;</description>
      <pubDate>Mon, 10 Dec 2007 08:50:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:5d98a980-e3d6-4afd-8eb3-25769a28d13b</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/12/10/run-babel-anywhere-java-runs-with-jbabel</link>
      <category>Tools</category>
      <category>jbabel</category>
      <category>babel</category>
      <category>openbabel</category>
      <category>nestedvm</category>
      <category>molfile</category>
      <category>canonicalsmiles</category>
      <category>smiles</category>
    </item>
    <item>
      <title>SMILES and Aromaticity: Broken?</title>
      <description>&lt;p&gt;&lt;a href="http://opensmiles.org"&gt;&lt;img src="http://depth-first.com/demo/20071114/osmi.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Since its introduction in 1988, the Simplified Molecular Input Line Entry System (&lt;a href="http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html"&gt;SMILES&lt;/a&gt;) has become one of the most widely-used molecular encoding systems in cheminformatics. But all technologies, no matter how widely-used, can be improved, and SMILES is no exception. This article, the first in a series, discusses a particularly thorny problem in the SMILES language.&lt;/p&gt;

&lt;h4&gt;A Little About SMILES&lt;/h4&gt;

&lt;p&gt;From the beginning, SMILES was a creative response to the complexity of the then-dominant &lt;a href="http://depth-first.com/articles/2007/07/20/everything-old-is-new-again-wiswesser-line-notation-wln"&gt;Wiswesser Line Notation&lt;/a&gt;. This can be seen perhaps nowhere more clearly than in the introduction to Weininger's &lt;a href="http://dx.doi.org/10.1021/ci00057a005"&gt;seminal paper on SMILES&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;SMILES is a chemical notation language specifically designed for computer use by chemists. ... Among several approaches to computerized chemical notation, line notation is popular because it represents molecular structure by a linear string of symbols, similar to natural language. The Wiswesser Line Notation is the most widely used representative of this method. It meets the essential requirements for a deterministic chemical notation, but it is difficult to use because many rules must be followed to generate the correct notation of a complex structure. To overcome this and other difficulties, the SMILES system was designed to be truly computer interactive.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What started out as a way for humans to more easily encode molecular structures has since evolved into a way for computers to encode molecular structures. Several factors are responsible for this shift, the biggest being the emergence of the Graphical User Interface, and with it, the &lt;a href="http://depth-first.com/articles/2007/11/27/chemwriter-chemical-structures-and-the-web"&gt;chemical structure editor&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Today, few chemists know how to encode SMILES nor, understandably, do they want to.&lt;/p&gt;

&lt;p&gt;But rather than dying out, SMILES found a new niche. Computers in the late '80's were mere toys; storage space was measured in kilobytes, and bandwith was practically nonexistent. But with a few ASCII characters, the complete connection table of most organic molecules could be encoded by SMILES. Not only this, but the algorithms needed to encode and decode SMILES were easy to reduce to practice in software. Daylight's original implementation of SMILES was soon joined by many others.&lt;/p&gt;

&lt;p&gt;A de facto standard was born.&lt;/p&gt;

&lt;h4&gt;If It Ain't Broke, Don't Fix It&lt;/h4&gt;

&lt;p&gt;For the last twenty years, SMILES has been used with great success to encode and store molecular structures. In an industry with few standards, SMILES is a rare example that shows what might be possible.&lt;/p&gt;

&lt;p&gt;If SMILES has been so successful, then what's &lt;a href="http://depth-first.com/articles/2007/02/14/whats-broken-in-cheminformatics"&gt;broken&lt;/a&gt; that needs fixing?&lt;/p&gt;

&lt;p&gt;Over the years, a growing list of missing, inconsistent, or confusing aspects of the SMILES language have come to light. One vendor of a SMILES implementation &lt;a href="http://www.eyesopen.com/docs/html/pyprog/DaylightSMILES.html"&gt;has even cataloged some of them&lt;/a&gt;. In most cases, the various implementers of SMILES systems have done the only thing they could do under the circumstances: apply their own judgment and best guesses.&lt;/p&gt;

&lt;p&gt;The result has been the gradual introduction of subtle incompatibilities among the SMILES implementations currently in use. This is the problem that the &lt;a href="http://opensmiles.org"&gt;OpenSMILES&lt;/a&gt; group aims to address.&lt;/p&gt;

&lt;p&gt;This status quo works in an environment of information silos, proprietary code, and closed data. But as cheminformatics moves in the direction of open data and interoperability, the problems become painfully apparent.&lt;/p&gt;

&lt;p&gt;Of all the topics that have been discussed so far by the OpenSMILES group, one stands out for its level of interest, number of contributors, strong opinions, and detailed discussion: lower-case atom symbols and aromaticity.&lt;/p&gt;

&lt;h4&gt;Aromaticity in SMILES&lt;/h4&gt;

&lt;p&gt;SMILES allows two kinds of atoms to be specified: upper-case and lower-case. Lower case atoms, according to existing documentation, signify 'aromatic' atoms.&lt;/p&gt;

&lt;p&gt;Weininger made clear that the reason for introducing lower case atom symbols was to facilitate canonicalization and substructure recognition. From &lt;a href="http://dx.doi.org/10.1021/ci00057a005"&gt;the original paper&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;Aromaticity must be detected in a system that generates an unambiguous chemical nomenclature. As will be discussed in following papers, this is needed both for the generation of a unique nomenclature and for effective substructure recognition. There can be no definition of 'aromaticity' that is both rigorous and all-encompassing: the word implies something about 'reactivity' to a synthetic chemist, 'ring current' to a NMR spectroscopist, 'symmetry' to a crystallographer, and presumably 'odor' to the original user of the word. Our objective in defining aromaticity is to provide an automatic and rigorous definition for the purposes of generating an unambiguous chemical nomenclature. Although the SMILES algorithm produces results that most chemists find natural, nothing is implied by this definition about physical properties.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kekule structures, in which double bonds and single bonds alternate, make it difficult for computers to implement certain kinds of algorithms. Defining lower case atom symbols to remove artificial asymmetry eliminated these problems.&lt;/p&gt;

&lt;p&gt;Weininger's original paper then goes on to describe the criteria for aromaticity in the SMILES language. At it's core, aromaticity boils down to the following defintion:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;... To qualify as aromatic, all atoms in the ring must be sp2 hybridized and the number of available 'excess' &amp;pi; electrons must satisfy H&amp;uuml;ckel's 4n+2 criterion. ...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071128/cb_cot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Seems simple enough, but even in 1988 things were not so clear. For just a few sentences later, Weininger continues:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;... Entries of c1ccc1 and c1ccccccc1 will produce the correct &lt;strong&gt;antiaromatic&lt;/strong&gt; structures for cyclobutadiene and cyclooctatetraene, C1=CC=C1 and C1=CC=CC=CC=C1, respectively. ... [emphasis added]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;How are we to interpret this? Apparently, c1ccc1 and c1ccccccc1, neither of which obey the 4n+2 rule, are nevertheless &lt;em&gt;valid&lt;/em&gt; SMILES. We can even use &lt;a href="http://www.daylight.com/daycgi/depict"&gt;Daylight's Depict&lt;/a&gt; application to verify for ourselves that both c1ccc1 and c1ccccccc1 are read and depicted.&lt;/p&gt;

&lt;p&gt;Perhaps the concept of "antiaromaticity" (in contrast to "non-aromaticity") holds a special place in the SMILES language. If so, this distinction has never been clarified.&lt;/p&gt;

&lt;p&gt;While puzzling over the apparent contradiction, we later read that:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;... For example, quinone is nonaromatic, with only four excess electrons.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071128/quinone.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Weininger goes on to imply that the only correct way to represent quinone in SMILES is without lower case atom symbols, for example:&lt;/p&gt;

&lt;p&gt;O=C1CCC(=O)CC1&lt;/p&gt;

&lt;p&gt;And still later:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;... For example, if one of the benzene ring's electrons is removed to form c1ccc[cH+]1, this ion is not aromatic because there are only five &amp;pi; electrons. ...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ambiguity makes it impossible to write standardized software: either 4n+2 is the rule for triggering the aromatic flag, and therefore lower case atom symbols, or it is not. If exceptions to this rule are needed, they must be specified in enough detail to be reduced to practice. To my knowledge, no documentation written in 1988 or since then has provided the necessary guidance.&lt;/p&gt;

&lt;p&gt;We can't have it both ways.&lt;/p&gt;

&lt;h4&gt;More Brokenness&lt;/h4&gt;

&lt;p&gt;Next, consider some of the examples left out of the original SMILES description. What about oligocyclic aromatics?&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071128/fluorenone.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10241"&gt;Fluorenone&lt;/a&gt;, according to the SMILES electron counting rules, has twelve &amp;pi; electrons and is therefore not aromatic. Strictly speaking, a SMILES like this:&lt;/p&gt;

&lt;p&gt;O=c2c1ccccc1c3ccccc23&lt;/p&gt;

&lt;p&gt;in which the carbonyl carbon is represented with a lower case atom symbol, should be considered invalid. Not just undesirable, but &lt;em&gt;verboten&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Yet Daylight's own Depict program, and other SMILES implementations, treat it as valid.&lt;/p&gt;

&lt;p&gt;Despite the lack of an aromatic tricyclic ring system, we may nevertheless want (or need) to represent fluorenone using lower case atom symbols. After all, canonicalization and substructure searches are very difficult otherwise.&lt;/p&gt;

&lt;p&gt;So any software we write needs to peel back layers of the tricyclic ring system in a quest for isolated aromatic rings. This exercise is clearly chemically meaningless as all atoms are coplanar and sp2 hybridized, and therefore interact. The counterargument is that the SMILES aromaticity model has no basis in reality - it's just a convention. So we press on.&lt;/p&gt;

&lt;p&gt;We eventually end up with a SMILES like this:&lt;/p&gt;

&lt;p&gt;O=C2c1ccccc1c3ccccc23&lt;/p&gt;

&lt;p&gt;The larger problem is making it clear when a reader or writer is and isn't allowed to perform this peeling back operation in search of aromaticity. Does the above SMILES match the SMILES definition of aromaticity or does it not? Are we allowed to peel back ring systems looking for imaginary 'embedded' aromatic ring systems or are we not?&lt;/p&gt;

&lt;p&gt;The answer may exist somewhere, just not in the documentation I have access to.&lt;/p&gt;

&lt;p&gt;The pragmatic approach, and the one taken by some implementations, is to simply ignore the whole question, forget about 4n+2, and call everything that 'looks' aromatic, like the fluorenone carbonyl carbon, 'aromatic.'&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071128/acenaphthalene.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;As another example, consider &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9161"&gt;acenaphthalene&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;c1cc2cccc3ccc(c1)c23&lt;/p&gt;

&lt;p&gt;Based on the published 4n+2 rules for SMILES aromaticity detection, acenaphthalene's twelve &amp;pi; electrons mean that it can't be represented in the aromatic form. It's not just discouraged - it's not allowed. Yet the Daylight Depict program, and a few other SMILES implementations, will accept this input as valid.&lt;/p&gt;

&lt;p&gt;The only way we can take advantage of the symmetrization afforded by lower case atom labels is to go hunting for isolated benzene rings. Upon doing so, we arrive at the following SMILES:&lt;/p&gt;

&lt;p&gt;c1cc2C=Cc3cccc(c1)c23&lt;/p&gt;

&lt;p&gt;Once again, we've more or less made an arbitrary distinction, assigning one set of carbons as aromatic and the other, fully coplanar, conjugated, and sp2-hybridized set as non-aromatic. Does the SMILES language allow us to do this? Again, the answer may exist somewhere, but not in any material I've been able to find.&lt;/p&gt;

&lt;p&gt;To put it simply, where in the SMILES documentation are we informed of which atoms in a coplanar, fully conjugated and sp2 hybridized ring system can be ignored from the 4n+2 test?&lt;/p&gt;

&lt;p&gt;For that matter, how do we know that oligocyclic aromatic ring systems are supported at all? Maybe only isolated five- and six-membered rings should be evaluated.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071128/pyrrolopyridine.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Consider pyrrolopyridine (depicted above):&lt;/p&gt;

&lt;p&gt;c2ccn1cccc1c2&lt;/p&gt;

&lt;p&gt;Now let's assume that the SMILES 4n+2 rule can only be applied to individual rings, not ring systems. This prevents us from writing a SMILES like the one shown above because the left-hand pyridine ring has a formal &amp;pi; electron count of 7 - two from each endocyclic double bond, two from the nitrogen atom, and one from the exocyclic double bond.&lt;/p&gt;

&lt;p&gt;The best we could do is to write a SMILES like this:&lt;/p&gt;

&lt;p&gt;c2cc1C=CC=Cn1c2&lt;/p&gt;

&lt;p&gt;The only way we can create an 'aromatic' SMILES for the 4n+2 pyrrolopyridine ring system is to combine the electron counts for both rings.&lt;/p&gt;

&lt;p&gt;But Daylight's own Depict system, and I suspect many others, imply that the fully aromatic version of the pyrrolopyridine SMILES is valid.&lt;/p&gt;

&lt;p&gt;Once again, we can't have it both ways. If full ring systems need to be perceived and tested for 4n+2 &amp;pi; electrons, then consistency requires it also be done for acenaphthalene, fluorenone, and countless others for which space and time prevent discussion. If particular ring systems are exempt, then the SMILES language documentation should specify in detail how to tell the difference.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Given the problems in combining SMILES' symmetrization capability and lower-case atom symbols with the overloaded concept of aromaticity, one has to wonder - is it worth the trouble? Given the disregard for these rules by working third-party code, by Daylight, and by the original SMILES documentation, how reasonable is it to continue to use 4n+2 as the rule? What does the resulting confusion really buy?&lt;/p&gt;

&lt;p&gt;There is a simple way to resolve the issue, but you're probably not going to like it - at least not at first. But that's a story for another time.&lt;/p&gt;</description>
      <pubDate>Wed, 28 Nov 2007 09:43:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:46e56185-51ea-466b-b4bc-c9edfc28b489</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/28/smiles-and-aromaticity-broken</link>
      <category>Tools</category>
      <category>smiles</category>
      <category>opensmiles</category>
      <category>aromaticity</category>
      <category>broken</category>
    </item>
    <item>
      <title>Making the Case: OpenSMILES</title>
      <description>&lt;p&gt;&lt;a href="http://www.opensmiles.org/"&gt;&lt;img src="http://depth-first.com/demo/20071114/osmi.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html"&gt;SMILES&lt;/a&gt; is one of the most widely-used line notations in cheminformatics. Yet until very recently, there has been no concerted attempt to develop open SMILES encoding standards.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.opensmiles.org/"&gt;OpenSMILES&lt;/a&gt; aims to change that. By providing a forum in which concerns from the SMILES user community can be voiced, peer-reviewed, and addressed, OpenSMILES introduces a new way for the SMILES language to become better.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.opensmiles.org/spec/open-smiles.html"&gt;A draft OpenSMILES specification&lt;/a&gt; is now available for review. For now, the best way to raise issues and otherwise get involved is through the &lt;a href="https://lists.sourceforge.net/lists/listinfo/blueobelisk-smiles"&gt;OpenSMILES mailing list&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Wed, 14 Nov 2007 09:42:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:cf48057b-4988-415c-8d7b-fc13c0347d24</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/14/making-the-case-opensmiles</link>
      <category>Open X</category>
      <category>smiles</category>
      <category>opensmiles</category>
    </item>
    <item>
      <title>Easily Convert IUPAC Nomenclature to SMILES, InChI, or Molfile with Rubidium</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20071015/rubidium.png" align="right"&gt;&lt;/img&gt;A recent article &lt;a href="http://depth-first.com/articles/2007/10/15/an-introduction-to-the-rubidium-cheminforamtics-toolkit-interconvert-smiles-inchi-and-molfile-with-an-open-babel-like-interface"&gt;introduced Rubidium&lt;/a&gt;, a cheminformatics toolkit written in Ruby. One of Ruby's strengths is the speed with which it enables disparate pieces of code to be glued together - even if they're written in different programming languages. In this article, we'll see how Rubidium can be extended to provide support for converting IUPAC nomenclature into SMILES, InChI, or Molfile formats.&lt;/p&gt;

&lt;h4&gt;About Rubidium&lt;/h4&gt;

&lt;p&gt;Rubidium is a cheminformatics toolkit written in Ruby. Rubidium is currently configured to run on &lt;a href="http://jruby.codehaus.org/"&gt;JRuby&lt;/a&gt;, although future versions may also work with &lt;a href="http://en.wikipedia.org/wiki/Ruby_(programming_language"&gt;Matz' Ruby Implementation&lt;/a&gt;) (MRI) via &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Rubidium will eventually be packaged as a &lt;a href="http://www.rubygems.org/"&gt;RubyGem&lt;/a&gt; and hosted on &lt;a href="http://rubyforge.org"&gt;RubyForge&lt;/a&gt;. For now, the toolkit consists of a running library that will updated and documented on this blog.&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;The library extends the CDK module presented in the &lt;a href="http://depth-first.com/articles/2007/10/15/an-introduction-to-the-rubidium-cheminforamtics-toolkit-interconvert-smiles-inchi-and-molfile-with-an-open-babel-like-interface"&gt;previous article in this series&lt;/a&gt;. The main change is the addition of an &lt;tt&gt;IUPACReader&lt;/tt&gt; class, based on Peter Corbett's excellent &lt;a href="http://depth-first.com/articles/2007/10/12/jruby-for-cheminformatics-parsing-iupac-nomenclature-with-opsin"&gt;OPSIN library&lt;/a&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;IUPACReader&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java.io.StringReader&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;uk.ac.cam.ch.wwmm.opsin.NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.io.CMLReader&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.ChemFile&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@iupac_reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;CMLReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;read&lt;/span&gt; &lt;span class="ident"&gt;name&lt;/span&gt;
    &lt;span class="ident"&gt;cml&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@iupac_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parse_to_cml&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;name&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Could not parse '&lt;span class="expr"&gt;#{name}&lt;/span&gt;'.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="keyword"&gt;unless&lt;/span&gt; &lt;span class="ident"&gt;cml&lt;/span&gt;

    &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;set_reader&lt;/span&gt; &lt;span class="constant"&gt;StringReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;to_xml&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;chem_file&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt; &lt;span class="constant"&gt;ChemFile&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="ident"&gt;chem_file&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;chem_sequence&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;chem_model&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;molecule_set&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Using this additional functionality requires nothing more than copying the &lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;OPSIN jarfile&lt;/a&gt; into the &lt;strong&gt;lib&lt;/strong&gt; directory of your JRuby installation. You'll also need to place the &lt;a href="http://downloads.sourceforge.net/cdk/cdk-1.0.1.jar?modtime=1182877138&amp;big_mirror=0"&gt;CDK jarfile&lt;/a&gt; in this directory if you haven't done so already.&lt;/p&gt;

&lt;p&gt;The complete Rubidium library can be &lt;a href="http://depth-first.com/demo/20071019/cdk.rb"&gt;downloaded here&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;A Test&lt;/h4&gt;

&lt;p&gt;We can test Rubidium's IUPAC nomenclature parsing abilities with &lt;tt&gt;jirb&lt;/tt&gt;. For example, to convert from name to SMILES:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'cdk'
=&gt; true
irb(main):002:0&gt; c=CDK::Conversion.new
=&gt; #&amp;lt;CDK::Conversion:0x46ca65 ... &amp;gt;
irb(main):003:0&gt; c.set_formats 'iupac', 'smi'
=&gt; "smi"
irb(main):004:0&gt; c.convert '1,4-dichlorobenzene'
=&gt; "C=1C=C(C=CC=1Cl)Cl"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;To convert from name to InChI (in the same &lt;tt&gt;jirb&lt;/tt&gt; session):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
irb(main):005:0&gt; c.set_out_format 'inchi'
=&gt; "inchi"
irb(main):006:0&gt; c.convert '1,4-dichlorobenzene'
=&gt; "InChI=1/C6H4Cl2/c7-5-1-2-6(8)4-3-5/h1-4H"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;And to convert from name to Molfile (also in the same &lt;tt&gt;jirb&lt;/tt&gt; session):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
irb(main):007:0&gt; c.set_out_format 'mol'
=&gt; "mol"
irb(main):008:0&gt; c.convert '1,4-dichlorobenzene'
=&gt; "\n  CDK    10/19/07,7:59\n\n  8  8  0  0  0  0  0  0  0  0999 V2000\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n  1  2  2  0  0  0  0 \n  2  3  1  0  0  0  0 \n  3  4  2  0  0  0  0 \n  4  5  1  0  0  0  0 \n  5  6  2  0  0  0  0 \n  6  1  1  0  0  0  0 \n  7  1  1  0  0  0  0 \n  8  4  1  0  0  0  0 \nM  END\n"
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;By re-using a simple conversion API together with another Java library, we've given Rubidium the ability to translate IUPAC nomenclature into other molecular languages. The additional code was both easy to write and easy to test. Future articles will discuss the packaging, distribution, and further elaboration of Rubidium.&lt;/p&gt;</description>
      <pubDate>Fri, 19 Oct 2007 10:05:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:1b7e76b3-93a7-4372-982f-cd60c9ed40d0</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/19/easily-convert-iupac-nomenclature-to-smiles-inchi-or-molfile-with-rubidium</link>
      <category>Tools</category>
      <category>rubidium</category>
      <category>iupac</category>
      <category>smiles</category>
      <category>inchi</category>
      <category>moflile</category>
    </item>
    <item>
      <title>JRuby for Cheminformatics: Parsing SMILES Simply</title>
      <description>&lt;p&gt;&lt;a href="http://cdk.sf.net"&gt;&lt;img src="http://depth-first.com/files/cdk_logo.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://ruby-lang.org"&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The previous article in this series outlined some &lt;a href="http://depth-first.com/articles/2007/10/08/five-reasons-to-start-using-jruby-now"&gt;reasons to consider JRuby for cheminformatics&lt;/a&gt;. Now I'll show how easy it is to get started by describing how to parse SMILES strings with the help of the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK).&lt;/p&gt;

&lt;h4&gt;What About Ruby CDK?&lt;/h4&gt;

&lt;p&gt;A number of Depth-First articles have discussed &lt;a href="http://depth-first.com/articles/2007/10/04/ruby-cdk-for-newbies"&gt;Ruby CDK&lt;/a&gt;. This library runs on top of C-Ruby, otherwise known as Matz' Ruby Implementation (MRI). &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt; connects MRI to a Java Virtual Machine under Ruby CDK.&lt;/p&gt;

&lt;p&gt;This article, and the others to follow, will instead discuss the use of the CDK and other Java libraries from JRuby. In contrast to MRI, JRuby is a pure Java implementation of the Ruby language. This approach offers some important advantages which will be highlighted along the way.&lt;/p&gt;

&lt;h4&gt;Installing JRuby&lt;/h4&gt;

&lt;p&gt;JRuby is not difficult to install. On Linux, the steps are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Install &lt;a href="http://java.sun.com"&gt;JDK Version 1.4 or higher&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Download and unpack the most recent JRuby release - at the time of this writing, &lt;a href="http://dist.codehaus.org/jruby"&gt;version 1.0.1&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add the JRuby &lt;tt&gt;bin&lt;/tt&gt; directory to your path.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There is no Step 4. ;-)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Installing CDK for JRuby&lt;/h4&gt;

&lt;p&gt;Installing CDK so that it works on JRuby is similarly quite simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Download the most recent CDK jarfile - at the time of this writing, &lt;a href="http://downloads.sourceforge.net/cdk/cdk-1.0.1.jar?modtime=1182877138&amp;amp;big_mirror=0"&gt;version 1.0.1&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move the CDK jarfile to your JRuby &lt;tt&gt;lib&lt;/tt&gt; directory.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Testing CDK for JRuby&lt;/h4&gt;

&lt;p&gt;You can verify that your new CDK for JRuby installation works with &lt;tt&gt;jirb&lt;/tt&gt;:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'java'
=&gt; true
irb(main):002:0&gt; include_class 'org.openscience.cdk.smiles.SmilesParser'
=&gt; ["org.openscience.cdk.smiles.SmilesParser"]
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;You should notice that &lt;tt&gt;jirb&lt;/tt&gt; takes a few seconds to initialize the JVM, whereas &lt;tt&gt;irb&lt;/tt&gt; starts almost instantly.&lt;/p&gt;

&lt;h4&gt;A Library to Read SMILES&lt;/h4&gt;

&lt;p&gt;We can write a short library to read SMILES strings using the CDK:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;include_class&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.smiles.SmilesParser&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;Daylight&lt;/span&gt;
  &lt;span class="attribute"&gt;@@smiles_parser&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;SmilesParser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;read_smiles&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="attribute"&gt;@@smiles_parser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parse_smiles&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Notice the use of the Rubyesque method name &lt;tt&gt;parse_smiles&lt;/tt&gt; rather than &lt;tt&gt;parseSmiles&lt;/tt&gt;. This is just one of the built-in conveniences offered by JRuby.&lt;/p&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

Saving the library as a file called &lt;strong&gt;daylight.rb&lt;/strong&gt; lets us test it using interactive JRuby:

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'daylight'
=&gt; true
irb(main):002:0&gt; include Daylight
=&gt; Object
irb(main):003:0&gt; mol = read_smiles 'c1ccccc1'
=&gt; #&lt;Java::OrgOpenscienceCdk:: [truncated] ...&gt;
irb(main):004:0&gt; mol.atom_count
=&gt; 6
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As you can see, the benzene SMILES has been parsed correctly. Again, notice the use of the Rubyesque method name &lt;tt&gt;atom_count&lt;/tt&gt;, rather than the CDK Java bean convention method name &lt;tt&gt;getAtomCount&lt;/tt&gt;. This feature makes it easy to ignore the fact you're using a Java library and get on with writing your Ruby code. Brilliant!&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;This article has shown how to install JRuby and begin to write some simple cheminformatics programs with a distinctive Ruby flavor. Although the focus was on SMILES parsing, there's much more functionality to be found within the CDK and other cheminformatics libraries written in Java. Future articles will outline some of the possibilities.&lt;/p&gt;</description>
      <pubDate>Tue, 09 Oct 2007 08:40:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:9007f034-5aa0-458c-b4e1-f9dc182d19be</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/09/jruby-for-cheminformatics-parsing-smiles-simply</link>
      <category>Tools</category>
      <category>jruby</category>
      <category>java</category>
      <category>ruby</category>
      <category>rubidium</category>
      <category>cdk</category>
      <category>smiles</category>
    </item>
    <item>
      <title>Ruby CDK One-Liners: Create a Molfile With 2D Atom Coordinates From Arbitrary SMILES Strings</title>
      <description>&lt;p&gt;&lt;a href="http://ruby-lang.org"&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;A very common operation in cheminformatics is the interconversion of molfiles and SMILES strings. Usually, converting from SMILES gives a molfile in which all atoms have coordinates of (0,0,0). Sometimes you just need more than that. The following &lt;a href="http://depth-first.com/articles/tag/rcdk"&gt;Ruby CDK&lt;/a&gt; code will accept an arbitrary SMILES string and return a molfile with fully-assigned 2D atom coordinates:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk/util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;include&lt;/span&gt; &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;

&lt;span class="constant"&gt;XY&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;coordinate_molfile&lt;/span&gt; &lt;span class="constant"&gt;Lang&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;smiles_to_molfile&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;c1ccccc1&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Looking at it this way, those four lines of require/include statements seem pretty darn verbose.&lt;/p&gt;</description>
      <pubDate>Thu, 20 Sep 2007 14:18:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:8c347f16-8a0c-4d35-a02c-a2560fdc5f79</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/20/ruby-cdk-one-liners-create-a-molfile-with-2d-atom-coordinates-from-arbitrary-smiles-strings</link>
      <category>Tools</category>
      <category>rubycdk</category>
      <category>rcdk</category>
      <category>smiles</category>
      <category>molfile</category>
      <category>interconversion</category>
      <category>sdg</category>
      <category>coordinates</category>
    </item>
    <item>
      <title>Everything Old is New Again: Wiswesser Line Notation (WLN)</title>
      <description>&lt;p&gt;Sometimes, searching through the attic of scientific ideas turns up unexpected treasures. Like old clothing styles that suddenly become fashionable again, the passage of time has a way of making old ideas relevant by supplying new context. Those ideas that once enjoyed widespread popularity followed by complete obscurity are especially interesting. This article talks about one of them and why it may matter again.&lt;/p&gt;

&lt;h4&gt;Some History&lt;/h4&gt;

&lt;p&gt;Wiswesser Line-Formula Chemical Notation (WLN) was the most popular of perhaps a dozen actively-used line notations systems during the 1960s and 1970s. Developed by William J. Wiswesser over a period of many years starting in the 1940s, WLN contains a surprising number of modern ideas about chemistry and information. At one point a serious contender for the position now held by IUPAC nomenclature, WLN has become so obscure that few chemists have even heard of it and no modern software can manipulate it. Even finding information on the basic grammar of WLN is difficult: almost all of this documentation is contained in out-of-print books.&lt;/p&gt;

&lt;h4&gt;A Guide&lt;/h4&gt;

&lt;p&gt;To my surprise, WLN is both easy to understand and easy to use. As far as canonicalized line notations go, WLN is far easier to comprehend than either &lt;a href="http://depth-first.com/articles/tag/inchi"&gt;InChI&lt;/a&gt; or &lt;a href="http://depth-first.com/articles/2007/04/03/creating-canonical-smiles-with-ruby-open-babel"&gt;Canonical SMILES&lt;/a&gt;. Even more surprisingly, WLN actually meets more than a few of the requirements for the &lt;a href="http://depth-first.com/articles/2007/03/14/eleven-qualities-of-the-perfect-line-notation-for-the-web"&gt;ideal line notation for the Web&lt;/a&gt;. I was always struck by claims that high school graduates with little chemistry background could be trained to encode WLN in a few weeks; this now seems very plausible.&lt;/p&gt;

&lt;p&gt;My guide is Elbert Smith's short 1968 book &lt;em&gt;The Wiswesser Line-Formula Chemical Notation&lt;/em&gt;. I was able to pick up a used copy in excellent condition for under $30.00 from Amazon.&lt;/p&gt;

&lt;h4&gt;Some Examples&lt;/h4&gt;

&lt;p&gt;Functional groups, carbon chains, and rings play central roles in WLN. Unlike modern line notations that emphasize atoms, WLN is designed to mirror the way that chemists actually think about chemistry.&lt;/p&gt;

&lt;p&gt;Consider acetone:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070720/acetone.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;strong&gt;1V1&lt;/strong&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The two "1"s stand for saturated one-carbon chains, i.e. methyl groups. The "V" stands for a carbon doubly-bonded to oxygen.&lt;/p&gt;

&lt;p&gt;Given nothing more than the above example, the encoding of diethyl ether should be completely clear:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070720/ether.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;strong&gt;2O2&lt;/strong&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;"O" simply stands for a divalent oxygen atom.&lt;/p&gt;

&lt;p&gt;The benzene ring is one of the most ubiquitous functional groups in organic chemistry. Wiswesser knew this and wanted to make it easy to encode aromatic compounds. His solution is simplicity itself. Consider acetophenone:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070720/acetophenone.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;strong&gt;1VR&lt;/strong&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The "R" stands for a benzene ring. WLN canonicalization gives it the lowest priority and this is why it appears last.&lt;/p&gt;

&lt;p&gt;What about disubstituted aromatics? Consider 4-chloroacetophenone:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070720/4-chloroacetophenone.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;strong&gt;GR DV1&lt;/strong&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The "G" symbol stands for chlorine. The " DV1" stands for the 4-acyl substituent. Here, the "D" denotes the 4-postion. The 3- position would result in " CV1", and the 2- position would give " BV1". The space character means that the character following it should be interpreted as ring locant.&lt;/p&gt;

&lt;p&gt;WLN uses a very simple system of canonicalization based on alphanumeric order. Priority increases in the direction: (1) symbols; (2) numbers in numerical order; and (3) letters in alphabetical order (with the exception of R which has lower priority than symbols). Coding generally begins at the substituent assigned the highest priority. This explains why 4-chloroacetophenone is not coded as "1VR DG".&lt;/p&gt;

&lt;h4&gt;Advantages of WLN&lt;/h4&gt;

&lt;p&gt;WLN is remarkably compact, especially when compared to SMILES and InChI. For example, consider the InChI for 4-chloroacetophenone, which is eight times longer than the corresponding WLN:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_inchi "&gt;InChI=1/C8H7ClO/c1-6(10)7-2-4-8(9)5-3-7/h2-5H,1H3&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Additionally, it's readily apparent to a human observer when a WLN is not properly coded - after all, the language was designed to be both read and written by humans rather than machines. Anyone can look at "GR DV1" and deduce almost instantly that it contains a carbonyl group (V), a phenyl group (R), a chloro group (G), and a methyl group (1).&lt;/p&gt;

&lt;p&gt;And if this functional group recognition is easy for humans, it's orders of magnitude easier for machines. It's not difficult at all to imagine very sophisticated and fast molecular query systems that do nothing more than simple processing of the ASCII text contained within WLN strings.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;It's very unlikely that WLN will ever be resurrected for the purpose of replacing existing line notations. On the other hand, WLN offers many potentially useful concepts for those creating new line notations. As they say, history doesn't repeat itself, but it frequently rhymes.&lt;/p&gt;</description>
      <pubDate>Fri, 20 Jul 2007 08:46:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:d729733e-ad5a-4895-b3e4-4ebd5b46740c</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/07/20/everything-old-is-new-again-wiswesser-line-notation-wln</link>
      <category>Tools</category>
      <category>wln</category>
      <category>smiles</category>
      <category>inchi</category>
      <category>linenotation</category>
    </item>
    <item>
      <title>Interconvert (Almost) Any SMILES and InChI with Ruby Open Babel</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.sf.net"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;SMILES and InChI are the two most widely-used &lt;a href="http://depth-first.com/articles/2007/03/14/eleven-qualities-of-the-perfect-line-notation-for-the-web"&gt;line notations&lt;/a&gt; in cheminformatics. Not surprisingly, there are many situations in which it's useful to interconvert the two. This article shows a simple method for doing so using &lt;a href="http://depth-first.com/articles/tag/rubyopenbabel"&gt;Ruby Open Babel&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Parsing InChIs&lt;/h4&gt;

&lt;p&gt;Version 1.01 of the IUPAC/NIST C InChI toolkit introduced the ability to parse InChIs. This capability has subsequently been incorporated into &lt;a href="http://openbabel.sf.net"&gt;Open Babel&lt;/a&gt;, and by extension, Ruby Open Babel. It's this capability that we'll take advantage of.&lt;/p&gt;

&lt;h4&gt;A Simple Library&lt;/h4&gt;

&lt;p&gt;The following library provides everything we need to convert between SMILES and InChI via Ruby:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;openbabel&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;InChI&lt;/span&gt;
  &lt;span class="attribute"&gt;@@to_smiles&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBConversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="attribute"&gt;@@to_inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBConversion&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="attribute"&gt;@@to_smiles&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;set_in_and_out_formats&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;inchi&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;smi&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="attribute"&gt;@@to_inchi&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;set_in_and_out_formats&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;smi&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;inchi&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;inchi_to_smiles&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBMol&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="attribute"&gt;@@to_smiles&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read_string&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;or&lt;/span&gt; &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Can't parse InChI: &lt;span class="expr"&gt;#{inchi}&lt;/span&gt;.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="attribute"&gt;@@to_smiles&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write_string&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;strip&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;smiles_to_inchi&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;OpenBabel&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;OBMol&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="attribute"&gt;@@to_inchi&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read_string&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;or&lt;/span&gt; &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Can't parse SMILES &lt;span class="expr"&gt;#{smiles}&lt;/span&gt;.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="attribute"&gt;@@to_inchi&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write_string&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;strip&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;After saving the above code to a file named &lt;strong&gt;inchi.rb&lt;/strong&gt;, we can interactively convert SMILES and InChIs:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'inchi'
=&gt; true
irb(main):002:0&gt; include InChI
=&gt; Object
irb(main):003:0&gt; smiles = inchi_to_smiles "InChI=1/C14H12/c1-3-7-13(8-4-1)11-12-14-9-5-2-6-10-14/h1-12H/b12-11-"
=&gt; "c1ccc(cc1)C(/[H])=C(/[H])c1ccccc1"
irb(main):004:0&gt; inchi = smiles_to_inchi smiles
=&gt; "InChI=1/C14H12/c1-3-7-13(8-4-1)11-12-14-9-5-2-6-10-14/h1-12H/b12-11-"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;In the above test, the InChI for &lt;em&gt;cis&lt;/em&gt;-stilbene is converted into a SMILES string which is then converted back to InChI form with complete fidelity, including alkene geometry. Note that this would not have been possible using the approach that was &lt;a href="http://depth-first.com/articles/2006/09/19/decoding-inchis-with-rino"&gt;previously discussed&lt;/a&gt; in which molfiles were used as intermediate datastructures.&lt;/p&gt;

&lt;p&gt;What about chiral centers? Here the results are mixed. For example, when the round-trip conversion is applied to propranalol (&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=21138"&gt;PubChem&lt;/a&gt;, &lt;a href="http://60minutes.yahoo.com/segment/21/memory_drug"&gt;Video&lt;/a&gt;), the configuration of the stereocenter is &lt;em&gt;inverted&lt;/em&gt;.&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'inchi'
=&gt; true
irb(main):002:0&gt; include InChI
=&gt; Object
irb(main):003:0&gt; smiles = inchi_to_smiles "InChI=1/C16H21NO2/c1-12(2)17-10-14(18)11-19-16-9-5-7-13-6-3-4-8-15(13)16/h3-9,12,14,17-18H,10-11H2,1-2H3/t14-/m1/s1"
=&gt; "CC(C)NC[C@@H](COc1cccc2ccccc12)O"
irb(main):004:0&gt; inchi = smiles_to_inchi smiles
=&gt; "InChI=1/C16H21NO2/c1-12(2)17-10-14(18)11-19-16-9-5-7-13-6-3-4-8-15(13)16/h3-9,12,14,17-18H,10-11H2,1-2H3/t14-/m0/s1"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;However, the same round-trip conversion of phenethanol works without inversion of stereochemistry:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'inchi'
=&gt; true
irb(main):002:0&gt; include InChI
=&gt; Object
irb(main):003:0&gt; smiles = inchi_to_smiles " InChI=1/C8H10O/c1-7(9)8-5-3-2-4-6-8/h2-7,9H,1H3/t7-/m0/s1"
=&gt; "C[C@@H](c1ccccc1)O"
irb(main):004:0&gt; inchi = smiles_to_inchi smiles
=&gt; "InChI=1/C8H10O/c1-7(9)8-5-3-2-4-6-8/h2-7,9H,1H3/t7-/m0/s1"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The most likely explanation is that under certain conditions, Open Babel incorrectly interprets and/or writes stereo parities.&lt;/p&gt;

&lt;h4&gt;One More Gotcha&lt;/h4&gt;

&lt;p&gt;On my system (Linux Mandriva 2007.1), attempting to perform the round-trip test on glucose resulted (reproducibly) in a segfault:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'inchi'
=&gt; true
irb(main):002:0&gt; include InChI
=&gt; Object
irb(main):003:0&gt; smiles = inchi_to_smiles "InChI=1/C6H12O6/c7-1-2-3(8)4(9)5(10)6(11)12-2/h2-11H,1H2/t2-,3-,4+,5-,6?/m1/s1"
=&gt; "C([C@H]1[C@H]([C@@H]([C@H](C(O)O1)O)O)O)O"
irb(main):004:0&gt; inchi = smiles_to_inchi smiles
./inchi.rb:20: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i686-linux]

Aborted
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The same segfault was obtained when using the &lt;tt&gt;babel&lt;/tt&gt; command-line utility:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ babel -ismi -oinchi
C([C@H]1[C@H]([C@@H]([C@H](C(O)O1)O)O)O)O
[Return]
Segmentation fault
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;As you can see, Ruby Open Babel makes short work of interconverting SMILES and InChIs. Despite problems with stereochemical configuration and segfaults on reading certain SMILES strings, the approach outlined here offers a quick and economical way to interconvert a variety of SMILES and InChIs.&lt;/p&gt;</description>
      <pubDate>Mon, 25 Jun 2007 08:45:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:08b043b0-d9c9-4de9-bc51-c20b4f94c306</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel</link>
      <category>Tools</category>
      <category>inchi</category>
      <category>smiles</category>
      <category>rubyopenbabel</category>
    </item>
  </channel>
</rss>
