<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag iupac</title>
    <link>http://depth-first.com/articles/tag/iupac</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Easily Convert IUPAC Nomenclature to SMILES, InChI, or Molfile with Rubidium</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20071015/rubidium.png" align="right"&gt;&lt;/img&gt;A recent article &lt;a href="http://depth-first.com/articles/2007/10/15/an-introduction-to-the-rubidium-cheminforamtics-toolkit-interconvert-smiles-inchi-and-molfile-with-an-open-babel-like-interface"&gt;introduced Rubidium&lt;/a&gt;, a cheminformatics toolkit written in Ruby. One of Ruby's strengths is the speed with which it enables disparate pieces of code to be glued together - even if they're written in different programming languages. In this article, we'll see how Rubidium can be extended to provide support for converting IUPAC nomenclature into SMILES, InChI, or Molfile formats.&lt;/p&gt;

&lt;h4&gt;About Rubidium&lt;/h4&gt;

&lt;p&gt;Rubidium is a cheminformatics toolkit written in Ruby. Rubidium is currently configured to run on &lt;a href="http://jruby.codehaus.org/"&gt;JRuby&lt;/a&gt;, although future versions may also work with &lt;a href="http://en.wikipedia.org/wiki/Ruby_(programming_language"&gt;Matz' Ruby Implementation&lt;/a&gt;) (MRI) via &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Rubidium will eventually be packaged as a &lt;a href="http://www.rubygems.org/"&gt;RubyGem&lt;/a&gt; and hosted on &lt;a href="http://rubyforge.org"&gt;RubyForge&lt;/a&gt;. For now, the toolkit consists of a running library that will updated and documented on this blog.&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;The library extends the CDK module presented in the &lt;a href="http://depth-first.com/articles/2007/10/15/an-introduction-to-the-rubidium-cheminforamtics-toolkit-interconvert-smiles-inchi-and-molfile-with-an-open-babel-like-interface"&gt;previous article in this series&lt;/a&gt;. The main change is the addition of an &lt;tt&gt;IUPACReader&lt;/tt&gt; class, based on Peter Corbett's excellent &lt;a href="http://depth-first.com/articles/2007/10/12/jruby-for-cheminformatics-parsing-iupac-nomenclature-with-opsin"&gt;OPSIN library&lt;/a&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;IUPACReader&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java.io.StringReader&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;uk.ac.cam.ch.wwmm.opsin.NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.io.CMLReader&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.ChemFile&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@iupac_reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;CMLReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;read&lt;/span&gt; &lt;span class="ident"&gt;name&lt;/span&gt;
    &lt;span class="ident"&gt;cml&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@iupac_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parse_to_cml&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;name&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Could not parse '&lt;span class="expr"&gt;#{name}&lt;/span&gt;'.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="keyword"&gt;unless&lt;/span&gt; &lt;span class="ident"&gt;cml&lt;/span&gt;

    &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;set_reader&lt;/span&gt; &lt;span class="constant"&gt;StringReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;to_xml&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;chem_file&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt; &lt;span class="constant"&gt;ChemFile&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="ident"&gt;chem_file&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;chem_sequence&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;chem_model&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;molecule_set&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Using this additional functionality requires nothing more than copying the &lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;OPSIN jarfile&lt;/a&gt; into the &lt;strong&gt;lib&lt;/strong&gt; directory of your JRuby installation. You'll also need to place the &lt;a href="http://downloads.sourceforge.net/cdk/cdk-1.0.1.jar?modtime=1182877138&amp;big_mirror=0"&gt;CDK jarfile&lt;/a&gt; in this directory if you haven't done so already.&lt;/p&gt;

&lt;p&gt;The complete Rubidium library can be &lt;a href="http://depth-first.com/demo/20071019/cdk.rb"&gt;downloaded here&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;A Test&lt;/h4&gt;

&lt;p&gt;We can test Rubidium's IUPAC nomenclature parsing abilities with &lt;tt&gt;jirb&lt;/tt&gt;. For example, to convert from name to SMILES:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'cdk'
=&gt; true
irb(main):002:0&gt; c=CDK::Conversion.new
=&gt; #&amp;lt;CDK::Conversion:0x46ca65 ... &amp;gt;
irb(main):003:0&gt; c.set_formats 'iupac', 'smi'
=&gt; "smi"
irb(main):004:0&gt; c.convert '1,4-dichlorobenzene'
=&gt; "C=1C=C(C=CC=1Cl)Cl"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;To convert from name to InChI (in the same &lt;tt&gt;jirb&lt;/tt&gt; session):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
irb(main):005:0&gt; c.set_out_format 'inchi'
=&gt; "inchi"
irb(main):006:0&gt; c.convert '1,4-dichlorobenzene'
=&gt; "InChI=1/C6H4Cl2/c7-5-1-2-6(8)4-3-5/h1-4H"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;And to convert from name to Molfile (also in the same &lt;tt&gt;jirb&lt;/tt&gt; session):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
irb(main):007:0&gt; c.set_out_format 'mol'
=&gt; "mol"
irb(main):008:0&gt; c.convert '1,4-dichlorobenzene'
=&gt; "\n  CDK    10/19/07,7:59\n\n  8  8  0  0  0  0  0  0  0  0999 V2000\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n  1  2  2  0  0  0  0 \n  2  3  1  0  0  0  0 \n  3  4  2  0  0  0  0 \n  4  5  1  0  0  0  0 \n  5  6  2  0  0  0  0 \n  6  1  1  0  0  0  0 \n  7  1  1  0  0  0  0 \n  8  4  1  0  0  0  0 \nM  END\n"
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;By re-using a simple conversion API together with another Java library, we've given Rubidium the ability to translate IUPAC nomenclature into other molecular languages. The additional code was both easy to write and easy to test. Future articles will discuss the packaging, distribution, and further elaboration of Rubidium.&lt;/p&gt;</description>
      <pubDate>Fri, 19 Oct 2007 10:05:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:1b7e76b3-93a7-4372-982f-cd60c9ed40d0</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/19/easily-convert-iupac-nomenclature-to-smiles-inchi-or-molfile-with-rubidium</link>
      <category>Tools</category>
      <category>rubidium</category>
      <category>iupac</category>
      <category>smiles</category>
      <category>inchi</category>
      <category>moflile</category>
    </item>
    <item>
      <title>JRuby for Cheminformatics: Parsing IUPAC Nomenclature with OPSIN</title>
      <description>&lt;p&gt;&lt;a href="http://ruby-lang.org"&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Recent articles have discussed the use of &lt;a href="http://depth-first.com/articles/tag/rubidium"&gt;JRuby for cheminformatics&lt;/a&gt;. We've seen how to &lt;a href="http://depth-first.com/articles/2007/10/09/jruby-for-cheminformatics-parsing-smiles-simply"&gt;parse SMILES strings&lt;/a&gt;, and &lt;a href="http://depth-first.com/articles/2007/10/10/jruby-for-cheminformatics-reading-and-writing-inchis-via-the-java-native-interface"&gt;read or write InChIs&lt;/a&gt;. In this article, we'll see how easy it is to parse IUPAC nomenclature from JRuby using Peter Corbett's &lt;a href="http://depth-first.com/articles/2006/10/14/decoding-iupac-names-with-opsin"&gt;OPSIN library&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Installation&lt;/h4&gt;

&lt;p&gt;After &lt;a href="http://depth-first.com/articles/2007/10/09/jruby-for-cheminformatics-parsing-smiles-simply"&gt;installing JRuby&lt;/a&gt;, simply &lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;download the OPSIN jarfile&lt;/a&gt; and copy it to your JRuby &lt;tt&gt;lib&lt;/tt&gt; directory. You're done.&lt;/p&gt;

&lt;h4&gt;A Simple Library&lt;/h4&gt;

&lt;p&gt;We can write a simple library to convert an IUPAC name into a CML document:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;jruby&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;uk.ac.cam.ch.wwmm.opsin.NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;IUPAC&lt;/span&gt;
  &lt;span class="attribute"&gt;@@nts&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;read_name&lt;/span&gt; &lt;span class="ident"&gt;name&lt;/span&gt;
    &lt;span class="ident"&gt;cml&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@@nts&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parse_to_cml&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;name&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Could not parse '&lt;span class="expr"&gt;#{name}&lt;/span&gt;'.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="keyword"&gt;unless&lt;/span&gt; &lt;span class="ident"&gt;cml&lt;/span&gt;

    &lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;to_xml&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;read_name&lt;/tt&gt; method accepts an iupac name as a string and returns a CML document as a string. If the input can't be parsed, an exception is raised.&lt;/p&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;We can test the library by saving it as a file called &lt;strong&gt;iupac.rb&lt;/strong&gt; and invoking &lt;tt&gt;jirb&lt;/tt&gt;:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'iupac'
=&gt; true
irb(main):002:0&gt; include IUPAC
=&gt; Object
irb(main):003:0&gt; read_name('4-iodobenzoic acid')
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This returns the XML shown below, which has been re-formatted for clarity:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;cml&lt;/span&gt; &lt;span class="attribute"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/schema&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;molecule&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;m1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atomArray&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a3&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;3&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a4&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;4&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a5&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;5&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a6&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;6&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a7&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a8&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;O&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a9&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;O&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a10&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;I&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atomArray&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bondArray&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a1 a2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a2 a3&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a3 a4&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a4 a5&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a5 a6&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a6 a1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a7 a1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a7 a8&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a7 a9&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a10 a4&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;bondArray&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This simple Ruby library has parsed the name '4-iodobenzoic acid' and has returned a string containing the CML representation for the molecule. If we had wanted the &lt;tt&gt;read_name&lt;/tt&gt; method to return a traversable XML object model, we could have enabled that as well.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;One of the objections raised whenever the issue of "new" programming languages comes up, regardless of their merit, is the age-old refrain "Yeah, but where's the software?" With JRuby, we bypass this question altogether. We can leverage the full scope of the massive Java development effort over the last ten years, which includes several excellent cheminformatics libraries. With virtually no effort, we have a working cheminformatics platform based on a widely-used, versatile and dynamic object-oriented scripting language. Future articles will discuss extensions to this platform and some applications.&lt;/p&gt;</description>
      <pubDate>Fri, 12 Oct 2007 10:37:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:873940fd-7b22-4013-a61b-ef928eee1c8e</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/12/jruby-for-cheminformatics-parsing-iupac-nomenclature-with-opsin</link>
      <category>Tools</category>
      <category>opsin</category>
      <category>iupac</category>
      <category>nametostruct</category>
      <category>rubidium</category>
      <category>jruby</category>
      <category>ruby</category>
      <category>java</category>
    </item>
    <item>
      <title>Eleven Qualities of The Perfect Line Notation for the Web</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/wenwennie/396170719/"&gt;&lt;img src="http://depth-first.com/demo/20070314/line.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;If you had to design the perfect line notation for the Web, what would it look like? This is hardly an academic exercise given the central role played by line notations in information systems. For a variety of reasons, existing line notations may not be the right match for the Web. This article explores this question and outlines the main qualities needed by a Web-friendly line notation.&lt;/p&gt;

&lt;h4&gt;A Few Lines About Line Notations&lt;/h4&gt;

&lt;p&gt;A line notation is any system that converts a molecular structure into a single line of text. Chemists have been using line notations for over 140 years - long before the advent of computers. Because of their versatility, line notations are frequently used in situations they were not designed for. When this happens, limitations become apparent, resulting in renewed efforts to build a better system.&lt;/p&gt;

&lt;p&gt;As &lt;a href="http://depth-first.com/articles/2006/08/18/107-years-of-line-formula-notations-1861-1968"&gt;noted previously&lt;/a&gt;, the invention of new line notations is a field whose popularity ebbs and flows over time. Currently, the three most important line notations are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IUPAC Nomenclature&lt;/li&gt;
&lt;li&gt;Simplified Molecular Input Line Entry System (SMILES)&lt;/li&gt;
&lt;li&gt;IUPAC International Chemical Identifier (InChI)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these systems has its own unique characteristics. &lt;a href="http://www.acdlabs.com/iupac/nomenclature/"&gt;IUPAC nomenclature&lt;/a&gt; is the oldest and most widely-used line notation. It appears in numerous contexts, including Web pages, peer-reviewed journals, reports, patents, MSDS sheets, catalogs, and reagent bottles. By comparison, &lt;a href="http://www.daylight.com/smiles/index.html"&gt;SMILES&lt;/a&gt; is a distant second in popularity. It's main role has been to facilitate machine entry of structural information by humans, &lt;a href="http://www.emolecules.com/"&gt;like this&lt;/a&gt;. &lt;a href="http://en.wikipedia.org/wiki/International_Chemical_Identifier"&gt;InChI&lt;/a&gt; is the newest of the bunch. It serves both as a line notation and as a unique identifier requiring no central authority.&lt;/p&gt;

&lt;h4&gt;The Perfect Line Notation for the Web&lt;/h4&gt;

&lt;p&gt;The emergence of the Web as a standard information delivery platform has refocused the attention of many developers on the line notation problem. With this idea in mind, here are some guesses about the qualities of the ideal Web-friendly line notation.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Readily Encodable and Decodable by Humans.&lt;/strong&gt; There's something unnerving about a line notation that can't easily be deciphered by humans. Is this really the right string? Did I copy it completely? This problem surfaces with every line notation, but some fare better than others. IUPAC nomenclature, for example, is one of the first things taught in many beginning organic chemistry classes. It's complicated, but still understandable by non-experts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Readily Encodable and Decodable by Machines.&lt;/strong&gt; It may be relatively simple for humans to read and write IUPAC nomenclature, but not so for machines. Software that reads and writes SMILES, on the other hand, is by comparison easy to write. This explains the abundance of software packages that handle SMILES and the &lt;a href="http://depth-first.com/articles/tag/opsin"&gt;scarcity&lt;/a&gt; of those that handle IUPAC nomenclature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Uses URI-Safe Characters Only.&lt;/strong&gt; A &lt;a href="http://en.wikipedia.org/wiki/Uniform_Resource_Identifier"&gt;URI&lt;/a&gt; uniquely identifies every document on the Internet. Why can't a line notation be used in combination with a URI to uniquely identify every molecule? One reason is that every line notation currently in use contains &lt;a href="http://www.freesoft.org/CIE/RFC/1738/4.htm"&gt;characters unsafe for use in URIs&lt;/a&gt;. Any line notation designed for use on the Web needs to avoid these characters in its syntax. &lt;em&gt;Update: InChI doesn't use unsafe characters, but it does use the reserved characters "=", "?", and "/". These characters may therefore &lt;a href="http://info-uri.info/registry/OAIHandler?verb=GetRecord&amp;amp;metadataPrefix=reg&amp;amp;identifier=info:inchi/"&gt;need to be escaped&lt;/a&gt;, depending on the context.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Encodes All Molecules.&lt;/strong&gt; Buried within every line notation is an opinion on what chemistry is really about. To operate on the Web, these opinions need to be as closely aligned as possible with those of chemists themselves. &lt;a href="http://depth-first.com/articles/tag/flexmol"&gt;Several Depth-First articles&lt;/a&gt; have discussed the limitations of existing line notations as molecular languages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compact.&lt;/strong&gt; Nobody wants to look at or manipulate a line of text that's longer than it needs to be. Of course, the more expressive a line notation is, the more verbose it will be. In other words, qualities 4 and 5 will always be in conflict.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Canonicalizable.&lt;/strong&gt; A line notation supports canonicalization when it specifies rules that can be guaranteed to always generate the same line notation for a given molecule. This feature enables many labor-saving assumptions. For example, a canonical representation makes a great identifier in a database, reducing the cost of storing and retrieving structural information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Explicit Hydrogen Atom Encoding.&lt;/strong&gt; SMILES makes few requirements regarding hydrogen atom encoding. As a result, each software implementation is left to its own devices. The resulting confusion is the price paid for the convenience (Quality 1) of a compact notation (Quality 5).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hierarchical Structure.&lt;/strong&gt; One of InChI's innovations was the introduction of a hierarchical encoding system. This system, also referred to as InChI "layers", enables a molecule to be viewed at several levels of resolution: as a molecular formula; as a network of atoms; as a network of atoms containing hydrogen atoms; as an atomic network with stereochemistry; and so on. I'm unaware of any reports in which this feature has been exploited in a practical way, although they aren't difficult to imagine.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flat Structure.&lt;/strong&gt; By grouping structural features into layers (Quality 8), InChI introduces a lot of complexity that is absent in SMILES and even IUPAC nomenclature. This complexity, in part, makes it difficult for both humans and machines to properly encode InChIs (Qualities 1 and 2). Given this complexity, and the fact that the utility of hierarchical encoding has yet to be conclusively demonstrated, it may be better to avoid it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Open Source Software Implementation.&lt;/strong&gt; No encoding standard in today's world stands a chance of gaining acceptance without an open source reference implementation. InChI broke new ground in this area and should serve as a model for any system that follows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unencumbered by Patents.&lt;/strong&gt; The success of molfile and SMILES as de facto standards derives partly from the decision made by their authors to refrain from patenting their languages. As a result, developers are motivated build their own implementations, rather than invent yet another language.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;A robust and modern line notation system is a key technology for chemically enabling the Web. Existing line notations, although useful in many contexts, were not designed with this particular role in mind. The time has come to consider whether a new line notation system, designed specifically with the Web and modern chemistry in mind, might offer a better solution.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Photo credit: &lt;a href="http://flickr.com/photos/wenwennie/"&gt;Wenwen&lt;/a&gt;  - &lt;a href="http://flickr.com"&gt;Flickr&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 14 Mar 2007 10:18:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:81f8ab71-4155-406b-adfa-2d1fde0c4f6b</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/14/eleven-qualities-of-the-perfect-line-notation-for-the-web</link>
      <category>Web</category>
      <category>inchi</category>
      <category>smiles</category>
      <category>iupac</category>
      <category>linenotation</category>
      <category>web</category>
      <category>uri</category>
    </item>
    <item>
      <title>From IUPAC Name to Molecular Formula with Ruby CDK</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right"&gt;&lt;/a&gt;Recently, a question was raised on the &lt;a href="http://tech.groups.yahoo.com/group/chemoinf/"&gt;Yahoo cheminf group list&lt;/a&gt; regarding the conversion of IUPAC names into molecular formulas. This can be done quickly with Ruby CDK, as this article will show.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;This tutorial requires &lt;a href="http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0"&gt;Ruby CDK&lt;/a&gt;, which in turn requires &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt; (RJB). A recent Depth-First article described the minimal system configuration required to run &lt;a href="http://depth-first.com/articles/2006/08/26/scripting-java-libraries-with-ruby-java-bridge"&gt;RJB on Linux&lt;/a&gt;. Another article showed how to install &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;RJB on Windows&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;A Small Library&lt;/h4&gt;

&lt;p&gt;The following library will convert IUPAC nomenclature into molecular formulas with Ruby:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk/util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;Formulator&lt;/span&gt;
  &lt;span class="attribute"&gt;@@hydrogen_adder&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.tools.HydrogenAdder&lt;/span&gt;&lt;span class="punct"&gt;').&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;get_formula&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;iupac_name&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Lang&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read_iupac&lt;/span&gt; &lt;span class="ident"&gt;iupac_name&lt;/span&gt;
    &lt;span class="attribute"&gt;@@hydrogen_adder&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;addExplicitHydrogensToSatisfyValency&lt;/span&gt; &lt;span class="ident"&gt;mol&lt;/span&gt;
    &lt;span class="ident"&gt;analyzer&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.tools.MFAnalyser&lt;/span&gt;&lt;span class="punct"&gt;').&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;analyzer&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getMolecularFormula&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Save this code as a file named &lt;strong&gt;formulator.rb&lt;/strong&gt; in your working directory.&lt;/p&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;The Formulator library can be tested with the following code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;formulator&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;include&lt;/span&gt; &lt;span class="constant"&gt;Formulator&lt;/span&gt;

&lt;span class="ident"&gt;get_formula&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;benzene&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; &amp;quot;C6H6&amp;quot;&lt;/span&gt;
&lt;span class="ident"&gt;get_formula&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;4-(3,4-dichlorophenyl)-N-methyl-1,2,3,4-tetrahydronaphthalen-1-amine&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; &amp;quot;C17H17NCl2&amp;quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Limitations&lt;/h4&gt;

&lt;p&gt;You may run across classes of structures that are not recognized by Ruby CDK. This is due to limitations of the underlying &lt;a href="http://depth-first.com/articles/tag/opsin"&gt;OPSIN library&lt;/a&gt;. For example, OPSIN does not yet recognize fused heterocycle names such as 'imidazo[2,1-b][1,3]thiazole'.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Ruby CDK makes short work of converting IUPAC names into molecular formulas. This is just one example of the kind of conversion that's possible. For example, &lt;a href="http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0"&gt;a recent article&lt;/a&gt; discussed the conversion of IUPAC names to color 2-D structures.&lt;/p&gt;

&lt;p&gt;Due to Ruby's position as both a highly functional scripting language and as the foundation for the popular Web application framework &lt;a href="http://www.rubyonrails.org/"&gt;Ruby on Rails&lt;/a&gt;, a variety of IUPAC nomenclature translation applications are just a few lines of code away.&lt;/p&gt;</description>
      <pubDate>Tue, 13 Mar 2007 10:25:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:6529cee0-0821-45b1-865a-267a3254d85a</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/13/from-iupac-name-to-molecular-formula-with-ruby-cdk</link>
      <category>Tools</category>
      <category>rubycdk</category>
      <category>rcdk</category>
      <category>iupac</category>
      <category>formula</category>
    </item>
    <item>
      <title>Google for Molecules with InChIMatic</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;a href="http://inchimatic.com"&gt;&lt;img src="http://depth-first.com/demo/20070219/inchimatic_logo.png" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt; is a simple Web application that uses Google to perform exact structure searches on the Web. After drawing your structure in the editor window, click the "InChI!" button to get a link. This link takes you to a Google query that displays matches for your molecule. You'll need both Java and JavaScript enabled in your browser to use InChIMatic.&lt;/p&gt;

&lt;h4&gt;The Technical Details&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://iupac.org/dhtml_home.html"&gt;&lt;img src="http://depth-first.com/demo/20070126/iupac_logo.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;The technology at the heart of InChIMatic is the &lt;a href="http://www.iupac.org/inchi/"&gt;IUPAC International Chemical Identifier&lt;/a&gt; (InChI). An InChI is an alphanumeric string that uniquely identifies a molecular structure. By converting molecular structures to text, InChI makes it easy to use standard Internet tools to do exact structure searches.&lt;/p&gt;

&lt;p&gt;The earliest reference in the peer-reviewed literature to using Google for searching InChIs is contained in a &lt;a href="http://dx.doi.org/10.1039/b502828k"&gt;2005 paper&lt;/a&gt;. More recently, a service called &lt;a href="http://querychem.com"&gt;QueryChem&lt;/a&gt; has taken this idea one step further by using the &lt;a href="http://code.google.com/"&gt;Google API&lt;/a&gt; to perform substructure searches based on InChI.&lt;/p&gt;

&lt;p&gt;InChIMatic works differently. Unlike a raw Google search, InChIMatic builds a Google query link for you. Unlike QueryChem, InChIMatic doesn't use the Google API and so has none of its restrictions. This does result in a limitation: InChIMatic can only currently be used to for exact structure queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubyonrails.org"&gt;&lt;img src="http://depth-first.com/files/rails_logo.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;The InChIMatic Web application has been discussed in greater technical detail in a &lt;a href="http://depth-first.com/articles/2006/12/15/anatomy-of-a-cheminformatics-web-application-inchimatic"&gt;previous article&lt;/a&gt;. The rapid Web application development framework &lt;a href="http://rubyonrails.com"&gt;Ruby on Rails&lt;/a&gt; made building InChIMatic a snap. InChIMatic is served by the Ruby application container &lt;a href="http://depth-first.com/articles/2007/02/05/mongrel-and-rails-its-just-not-fair"&gt;Mongrel&lt;/a&gt;, which is hosted on a Linux server running Apache. &lt;a href="http://depth-first.com/articles/tag/rino"&gt;Rino&lt;/a&gt; provided the Ruby interface to the &lt;a href="http://www.iupac.org/inchi/"&gt;IUPAC/NIST InChI toolkit&lt;/a&gt;. The 2-D structure editor is &lt;a href="http://www.molinspiration.com/jme/"&gt;Java Molecular Editor&lt;/a&gt; (JME) by Peter Ertl, which is used with his kind permission.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.opensource.org/docs/definition.php"&gt;&lt;img src="http://www.opensource.org/trademarks/opensource/web/opensource-110x95.png" align="right" alt="Open Source (OSI) Logo" border="0" width="110" height="95"&gt;&lt;/img&gt;&lt;/a&gt;Aside from JME, all components of InChIMatic, from the operating system it runs on to the InChI system itself, are &lt;a href="http://opensource.org"&gt;Open Source&lt;/a&gt; software.&lt;/p&gt;

&lt;h4&gt;Using InChI to Raise the Visibility of Your Content&lt;/h4&gt;

&lt;p&gt;InChIMatic returns many Google results for common molecules. But less common, known molecules return no hits at all. Three factors are responsible: (1) Google doesn't index all InChIs on the Internet; (2) few content providers currently use InChI; and (3) there is no standard and convenient mechanism to embed InChIs into Web pages for indexing by Google.&lt;/p&gt;

&lt;p&gt;For these reasons, I consider InChI to be bleeding edge technology. Some will find it useful, most will not. Unfortunately, this state of affairs will persist until problems (1) and (3) are solved.&lt;/p&gt;

&lt;p&gt;Nevertheless, if you're technically adventurous, InChIMatic offers a relatively painless way to begin incorporating InChIs into your content and verifying that they get indexed. There's no software to download, install, or upgrade. Forget about operating system incompatibilities (hopefully!). Just point your Java-enabled browser to &lt;a href="http://inchimatic.com"&gt;inchimatic.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Although there's no standard method to encode InChIs in Web pages, some interesting ideas have been put forward. &lt;a href="http://chem-bla-ics.blogspot.com/"&gt;Egon Willighagen&lt;/a&gt; has proposed &lt;a href="http://chem-bla-ics.blogspot.com/2006/12/including-smiles-cml-and-inchi-in.html"&gt;a system&lt;/a&gt; based on &lt;a href="http://www.w3.org/TR/xhtml-rdfa-primer/"&gt;RDFa&lt;/a&gt;. Future iterations of InChIMatic may include support for generating scripts and/or markup for including InChIs into blogs and other online content.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;InChI is a complex new technology in need of easy-to-use tools. InChIMatic is one such tool that makes it possible to perform exact structure queries using Google.&lt;/p&gt;

&lt;p&gt;One of the exciting things about Web applications is how quickly they can evolve. If in trying out InChIMatic you find something you'd like changed or added, please feel free to &lt;a href="mailto:r_apodaca@users.sf.net"&gt;write me&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Mon, 19 Feb 2007 10:18:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:eb531bca-c3b0-4f2d-8053-4272baa8bbfb</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/02/19/google-for-molecules-with-inchimatic</link>
      <category>Tools</category>
      <category>inchimatic</category>
      <category>inchi</category>
      <category>google</category>
      <category>webapp</category>
      <category>opensource</category>
      <category>rails</category>
      <category>iupac</category>
    </item>
    <item>
      <title>From IUPAC Nomenclature to 2-D Structures With OPSIN</title>
      <description>&lt;p&gt;A &lt;a href="http://depth-first.com/articles/2006/10/14/decoding-iupac-names-with-opsin"&gt;previous article&lt;/a&gt; introduced OPSIN, an Open Source Java library for decoding IUPAC chemical nomenclature. In this tutorial, you'll see how OPSIN can, when interfaced with freely-available chemical informatics software, generate 2-D structure diagrams from IUPAC names.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;This tutorial requires &lt;a href="http://depth-first.com/articles/2006/09/25/cdk-the-ruby-way-rcdk-0-2-0"&gt;Ruby CDK&lt;/a&gt; (RCDK), which in turn requires Ruby, Java, and the &lt;a href="http://rjb.rubyforge.org"&gt;Ruby Java Bridge&lt;/a&gt;. Tutorials detailing the installation of RCDK on both &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;Windows&lt;/a&gt; and &lt;a href="http://depth-first.com/articles/2006/09/25/cdk-the-ruby-way-rcdk-0-2-"&gt;Linux&lt;/a&gt; platforms are available.&lt;/p&gt;

&lt;p&gt;In addition, you'll need a copy of the standalone jarfile &lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;opsin-big-0.1.0.jar&lt;/a&gt;. Future versions of RCDK will integrate the OPSIN jarfile, making this step unnecessary.&lt;/p&gt;

&lt;h4&gt;Outlining the Problem and a Solution&lt;/h4&gt;

&lt;p&gt;We'd like to create a simple Ruby class with a method that accepts an IUPAC chemical name as input and produces a PNG image of the corresponding molecule as output. OPSIN accepts IUPAC names as input, but it only produces &lt;a href="http://www.xml-cml.org/"&gt;Chemical Markup Language&lt;/a&gt; (CML) as output. The CML output lacks 2-D coordinates, and OPSIN itself has no 2-D rendering capabilities.&lt;/p&gt;

&lt;p&gt;We'll use RCDK to augment OPSIN's capabilities. Thanks to CDK's built-in CML support, RCDK can read CML and generate an &lt;tt&gt;AtomContainer&lt;/tt&gt; representation. RCDK also supports the assignment of 2-D coordinates to an &lt;tt&gt;AtomContainer&lt;/tt&gt; via CDK's &lt;tt&gt;StructureDiagramGenerator&lt;/tt&gt;. To produce the PNG image, we'll use the 2-D rendering capability made possible through &lt;a href="http://depth-first.com/articles/2006/08/28/drawing-2-d-structures-with-structure-cdk"&gt;Structure-CDK&lt;/a&gt;, which is a built-in component of RCDK.&lt;/p&gt;

&lt;h4&gt;A Simple Ruby Library&lt;/h4&gt;

&lt;p&gt;Create a working directory and copy &lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;opsin-big-0.1.0.jar&lt;/a&gt; into it. Next, create a file called &lt;strong&gt;depictor.rb&lt;/strong&gt; containing the following Ruby code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;Java&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Classpath&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;add&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;opsin-big-0.1.0.jar&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;

&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="comment"&gt;# A simple IUPAC-&amp;gt;2-D structure convertor.&lt;/span&gt;
&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Depictor&lt;/span&gt;
  &lt;span class="attribute"&gt;@@StringReader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java.io.StringReader&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="attribute"&gt;@@NameToStructure&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;uk.ac.cam.ch.wwmm.opsin.NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="attribute"&gt;@@CMLReader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.io.CMLReader&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="attribute"&gt;@@ChemFile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.ChemFile&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@nts&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@@NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@@CMLReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Writes a &amp;lt;tt&amp;gt;width&amp;lt;/tt&amp;gt; by &amp;lt;tt&amp;gt;height&amp;lt;/tt&amp;gt; PNG to&lt;/span&gt;
  &lt;span class="comment"&gt;# &amp;lt;tt&amp;gt;filename&amp;lt;/tt&amp;gt; for the molecule described by&lt;/span&gt;
  &lt;span class="comment"&gt;# &amp;lt;tt&amp;gt;iupac_name&amp;lt;/tt&amp;gt;.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;depict_png&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;iupac_name&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;cml&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@nts&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parseToCML&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;iupac_name&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;throw&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Can't parse name: &lt;span class="expr"&gt;#{iupac_name}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt; &lt;span class="keyword"&gt;unless&lt;/span&gt; &lt;span class="ident"&gt;cml&lt;/span&gt;

    &lt;span class="ident"&gt;molfile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;cml_to_molfile&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Image&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;molfile_to_png&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;molfile&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="ident"&gt;private&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;cml_to_molfile&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;string_reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;StringReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;toXML&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;setReader&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;string_reader&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;chem_file&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@ChemFile&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;molecule&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;chem_file&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getChemSequence&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;getChemModel&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;getSetOfMolecules&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getMolecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;molecule&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;XY&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;coordinate_molecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Lang&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get_molfile&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Testing, Testing&lt;/h4&gt;

&lt;p&gt;A short test will demonstrate the capabilities of the &lt;tt&gt;Depictor&lt;/tt&gt; library. Add the following to a file called &lt;strong&gt;test.rb&lt;/strong&gt; in your working directory (or enter it interactively with irb):&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;depictor&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;depictor&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
&lt;span class="ident"&gt;name&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;3,3-dimethyl-7-oxo-6-[(2-phenylacetyl)amino]-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylic acid&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="comment"&gt;#Penicillin G&lt;/span&gt;

&lt;span class="ident"&gt;depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;depict_png&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;name&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;out.png&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Running this test produces a 300x300 PNG image of Penicillin G, named &lt;strong&gt;out.png&lt;/strong&gt;, in your working directory:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20061017/out.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;As you can see, this simple library and test code has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;correctly parsed the rather complex IUPAC name (3,3-dimethyl-7-oxo-6-[(2-phenylacetyl)amino]-4-thia-1-azabicyclo[3.2.0]heptane-2- carboxylic acid) to a valid CML representation&lt;/li&gt;
&lt;li&gt;converted this representation to a CDK &lt;tt&gt;AtomContainer&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;assigned 2-D coordinates&lt;/li&gt;
&lt;li&gt;rendered a PNG image in color&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice how the thiaazabicyclo[3.2.0] system, complete with properly-placed substitutents, was flawlessly identified and parsed.&lt;/p&gt;

&lt;p&gt;If you entered the above test code interactively via IRB, you may have noticed a multi-second delay in instantiating &lt;tt&gt;Depictor&lt;/tt&gt;. This latency results from a sluggish &lt;tt&gt;NameToStructure&lt;/tt&gt; constructor in OPSIN. A similar delay also occurs in OPSIN's pure-Java unit tests. Once &lt;tt&gt;Depictor&lt;/tt&gt; is instantiated, however, image generation occurs relatively quickly.&lt;/p&gt;

&lt;p&gt;The unususal orientation of the beta-lactam carbonyl group is determined by CDK's &lt;tt&gt;StructureDiagramGenerator&lt;/tt&gt;. The source of this behavior will be explored in a future article.&lt;/p&gt;

&lt;h4&gt;More Examples&lt;/h4&gt;

&lt;p&gt;To illustrate some of the capabilities of the OPSIN-RCDK combination, a few more examples are provided below.&lt;/p&gt;

&lt;p&gt;One of OPSIN's more surprising features is how well it handles heterocycles. For example, the IUPAC name for caffeine (&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2519"&gt;1,3,7-trimethylpurine-2,6-dione&lt;/a&gt;) is translated to:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/caffeine.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;As another example, consider the tetrazole (&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=180603"&gt;1-[2-hydroxy-3-propyl-4-[3-(2H-tetrazol-5-yl)propoxy]phenyl]ethanone&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/180603.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Highly substituted benzene rings and carboxylic acids are also translated accurately, as in &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2528"&gt;3-acetamido-5-(acetyl-methyl-amino)-2,4,6-triiodo-benzoic acid&lt;/a&gt; (Metrizoate):&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/metrizoate.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;How about a hairy-looking macrocycle name with multiple levels of morpheme nesting (&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2547"&gt;3,6-diamino-N-[[15-amino-11-(2-amino-3,4,5,6-tetrahydropyrimidin-4-yl)-8- [(carbamoylamino)methylidene]-2-(hydroxymethyl)-3,6,9,12,16-pentaoxo- 1,4,7,10,13-pentazacyclohexadec-5-yl]methyl]hexanamide&lt;/a&gt;)? Not a problem:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/2547.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h4&gt;Limitations&lt;/h4&gt;

&lt;p&gt;In my tests of the OPSIN library, one structure appeared to be incorrectly parsed - &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=180591"&gt;N-(5-chloro-2-methyl-phenyl)-2-methoxy-N-(2-oxooxazolidin-3-yl)acetamide&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/180591.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;There are actually two problems with the output. First, an oxygen atom and a methyl group are overlapping near the top of the diargram. This cosmetic issue is related to CDK's &lt;tt&gt;StructureDiagramGenerator&lt;/tt&gt;. Second, the oxazolidine nitrogen atom is misplaced by OPSIN. The correct 2-D image of this molecule, obtained from PubChem, is shown below:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/180591_pc.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;It's not common to find an early-development Open Source project with the sophistication of OPSIN. The smooth handling of nested morphemes, aromatic heterocycles, macrocycles, and a good fraction of what I threw at it leads me to belive that a well-designed and extensible nomenclature parsing engine lies at OPSIN's core. More on that later, though.&lt;/p&gt;

&lt;p&gt;What could you do with a powerful Open Source IUPAC nomenclature parser? The answer to that one question could fill a three-volume series. Suffice it to say that OPSIN, in combination with other Open Source software, offers virtually limitless potential for indexing, collecting, repackaging, reprocessing, and mashing up vast amounts of chemical information. Because of its Open Source license, OPSIN can be extended and otherwise modified to fit your particular needs. Future articles will highlight some of the possibilities.&lt;/p&gt;</description>
      <pubDate>Tue, 17 Oct 2006 13:57:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:fd6de2ae-23c8-4e50-9765-344e9a7a9545</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/10/17/from-iupac-nomenclature-to-2-d-structures-with-opsin</link>
      <category>Graphics</category>
      <category>opsin</category>
      <category>nametostruct</category>
      <category>iupac</category>
      <category>rcdk</category>
      <category>structure</category>
      <category>cdk</category>
      <category>integration</category>
      <category>mashup</category>
    </item>
    <item>
      <title>Decoding IUPAC Names With OPSIN</title>
      <description>&lt;p&gt;IUPAC chemical nomenclature is everywhere. It can be found in journal articles, both new and old, on the Web, in databases, on Material Safety Data Sheets (MSDS), in chemical catalogs, and just about anywhere chemical information is found. The rules of this nomenclature are one of the first things taught in Organic Chemistry classes, and entire books are devoted to the subject. Although software for IUPAC nomenclature translation has been researched &lt;a href="http://depth-first.com/articles/2006/09/10/chemical-nomenclature-translation"&gt;since the 1970s&lt;/a&gt;, it has only become widespread within the last ten years. As is typical, IUPAC nomenclature developer toolkits are closed, proprietary, very expensive, and not customizable - &lt;a href="http://depth-first.com/articles/2006/09/11/visualizing-iupac-names-with-chemnomparse"&gt;with one notable exception&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A little software package called OPSIN may be set to change this. Read on to see how you can use OPSIN to begin programatically decoding IUPAC chemical nomenclature today.&lt;/p&gt;

&lt;h4&gt;Meet OPSIN&lt;/h4&gt;

&lt;p&gt;OPSIN is an Open Source Java library for parsing IUPAC nomenclature. Despite its early development status, OPSIN can decode a variety of difficult features in basic IUPAC nomenclature, including bicyclo systems, nested substitution, saturated heterocycles, and a variety of arenes and heteroarenes. OPSIN currently doesn't handle stereochemistry, organometallics, or a variety of other advanced IUPAC nomenclature features.&lt;/p&gt;

&lt;h4&gt;Brief Background&lt;/h4&gt;

&lt;p&gt;OPSIN was written by &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/corbett/"&gt;Peter Corbett&lt;/a&gt; at the University of Cambridge. Until recently, OPSIN was an integral part of of the innovative chemical data checker &lt;a href="http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/ExperimentalDataChecker/index.asp"&gt;OSCAR&lt;/a&gt;. One of the exciting uses of OSCAR is in the &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=59"&gt;automated validation&lt;/a&gt; of experimental data.&lt;/p&gt;

&lt;h4&gt;Getting OPSIN&lt;/h4&gt;

&lt;p&gt;Recently, OPSIN was factored out of OSCAR. It can now be downloaded as two standalone packages from SourceForge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin_0.1.0.zip?download"&gt;Source Distribution&lt;/a&gt;: Contains the complete OPSIN source code, all library dependencies, all datasets, and an Ant build script.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;Jarfile&lt;/a&gt;: A standalone jarfile containing all library dependencies and data files.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;What OPSIN Does&lt;/h4&gt;

&lt;p&gt;OPSIN accepts an IUPAC name, encoded as a &lt;tt&gt;String&lt;/tt&gt; object, as input and provides a &lt;a href="http://www.xml-cml.org/"&gt;Chemical Markup Language&lt;/a&gt; (CML) document object model as output. The main point of entry into the library is the &lt;tt&gt;NameToStructure&lt;/tt&gt; class and its two overloaded &lt;tt&gt;parseToCML&lt;/tt&gt; methods.&lt;/p&gt;

&lt;p&gt;OPSIN's output is the root node in a &lt;a href="http://www.xom.nu/"&gt;XOM&lt;/a&gt; XML &lt;tt&gt;Element&lt;/tt&gt; hierarchy. XOM's &lt;tt&gt;Element&lt;/tt&gt; class provides a convenience method, &lt;tt&gt;toXML&lt;/tt&gt; that conveniently prints the text-based XML representation for itself and all &lt;tt&gt;Elements&lt;/tt&gt; below it.&lt;/p&gt;

&lt;p&gt;Because its output is pure XML, OPSIN does not depend on any chemical informatics toolkit to do its job. This makes OPSIN ideal for use within larger chemical informatics systems. Provided your software can interpret CML, you should be able to manipulate OPSIN's output in a variety of useful ways.&lt;/p&gt;

&lt;h4&gt;What's Next?&lt;/h4&gt;

&lt;p&gt;Future articles will discuss OPSIN's capabilities and limitations in more detail. As has become customary for Depth-First's tutorials, &lt;a href="http://ruby-lang.org"&gt;Ruby&lt;/a&gt; and the excellent &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;Ruby Java Bridge&lt;/a&gt; will be used to illustrate the important points.&lt;/p&gt;</description>
      <pubDate>Sat, 14 Oct 2006 14:39:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:2550a5cb-baf7-419b-af18-338272f3bb59</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/10/14/decoding-iupac-names-with-opsin</link>
      <category>Tools</category>
      <category>opsin</category>
      <category>nametostruct</category>
      <category>iupac</category>
      <category>oscar</category>
      <category>xom</category>
      <category>cml</category>
    </item>
    <item>
      <title>Visualizing IUPAC Names with ChemNomParse</title>
      <description>&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/09/10/chemical-nomenclature-translation"&gt;Nomenclature translation&lt;/a&gt; is the process of converting a human-readable chemical name into a machine-readable notational scheme such as a connection table. It plays a key role in linking the &lt;a href="http://depth-first.com/articles/2006/09/03/peculiarities-of-chemical-information"&gt;older chemical literature&lt;/a&gt; to modern information technologies, such as the Internet.&lt;/p&gt;

&lt;p&gt;Buried deep within the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK) is a library for nomenclature translation called &lt;a href="http://chemnomparse.sourceforge.net/"&gt;ChemNomParse&lt;/a&gt;. At the heart of ChemNomParse is a remarkable piece of software called the &lt;a href="https://javacc.dev.java.net/"&gt;Java Compiler Compiler&lt;/a&gt; (JavaCC), a parser generator and lexical analyzer generator for Java. A FAQ on JavaCC is available &lt;a href="http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-moz.htm"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This tutorial demonstrates how freely-available, open source tools can be used to parse an IUPAC chemical name and generate its corresponding 2-D structure rendering. A &lt;a href="http://depth-first.com/articles/2006/09/02/humanizing-line-notations"&gt;closely-related tutorial&lt;/a&gt; on generating 2-D structures from SMILES strings may be helpful as background.&lt;/p&gt;

&lt;h4&gt;Ingredients&lt;/h4&gt;

&lt;p&gt;This tutorial uses Arton's &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt;, the installation and use of which has been outlined &lt;a href="http://depth-first.com/articles/2006/08/26/scripting-java-libraries-with-ruby-java-bridge"&gt;previously&lt;/a&gt;. In addition, you'll need to download &lt;a href="http://prdownloads.sourceforge.net/structure/structure-cdk-0.1.2.zip?download"&gt;Structure-CDK v0.1.2&lt;/a&gt;, also &lt;a href="http://depth-first.com/articles/2006/08/28/drawing-2-d-structures-with-structure-cdk"&gt;previously discussed&lt;/a&gt;. Be sure to download v0.1.2, as two upgrades have been released since the package was originally described. This tutorial has been tested on Mandriva Linux 2006.&lt;/p&gt;

&lt;p&gt;Create a working directory called &lt;strong&gt;nom&lt;/strong&gt;. From the &lt;strong&gt;lib&lt;/strong&gt; directory of the Structure-CDK distribution, copy &lt;strong&gt;cdk-20060714.jar&lt;/strong&gt; and &lt;strong&gt;structure-cdk-0.1.2.jar&lt;/strong&gt; into your &lt;strong&gt;depict&lt;/strong&gt; working directory.&lt;/p&gt;

&lt;h4&gt;Code&lt;/h4&gt;

&lt;p&gt;Create a file called &lt;strong&gt;depict.rb&lt;/strong&gt; and copy the following code into it:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="constant"&gt;ENV&lt;/span&gt;&lt;span class="punct"&gt;['&lt;/span&gt;&lt;span class="string"&gt;CLASSPATH&lt;/span&gt;&lt;span class="punct"&gt;']&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;./cdk-20060714.jar:./structure-cdk-0.1.2.jar&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;NomParser&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.iupac.parser.NomParser&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="constant"&gt;StructureDiagramGenerator&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.layout.StructureDiagramGenerator&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="constant"&gt;ImageKit&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net.sf.structure.cdk.util.ImageKit&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Depictor&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@sdg&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;StructureDiagramGenerator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;depict_png&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_png&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="constant"&gt;ImageKit&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;writePNG&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom_to_mol&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;),&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_png&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;depict_svg&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_svg&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="constant"&gt;ImageKit&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;writeSVG&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom_to_mol&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;),&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_svg&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

&lt;span class="ident"&gt;private&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;nom_to_mol&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="attribute"&gt;@sdg&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;setMolecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;NomParser&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;generate&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;))&lt;/span&gt;
    &lt;span class="attribute"&gt;@sdg&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;generateCoordinates&lt;/span&gt;

    &lt;span class="attribute"&gt;@sdg&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getMolecule&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

After you save this file, you'll need to set your &lt;tt&gt;LD_LIBRARY_PATH&lt;/tt&gt; on unix (or the equivalent on another OS):

&lt;div class="console"&gt;
&lt;pre&gt;
$ export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/i386:$LD_LIBRARY_PATH
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This tells RJB where to find Java's native libraries. Because of RJB's current design, &lt;tt&gt;LD_LIBRARY_PATH&lt;/tt&gt; needs to be set from the command line, rather than from within a Ruby process.&lt;/p&gt;

&lt;p&gt;Using the Depictor class is as simple as creating an instance and invoking &lt;tt&gt;depict_png&lt;/tt&gt; or &lt;tt&gt;depict_svg&lt;/tt&gt; on it:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;depictor&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

&lt;span class="ident"&gt;depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;depict_png&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;2-phenylcyclohexan-1-ol&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;output.png&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Executing the above code either through the Ruby interpreter (ruby) or via Interactive Ruby (irb) products a PNG image of the chiral auxiliary shown below:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/files/phycy.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Other names correctly recognized by ChemNomParse include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;phenylhexyne&lt;/li&gt;
&lt;li&gt;2-chloro-3-phenyl-4,4-dimethylhexane&lt;/li&gt;
&lt;li&gt;3-phenyl-1-aminopropane&lt;/li&gt;
&lt;li&gt;1,2-difluoro-3-hydroxycyclohexene&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Limitations&lt;/h4&gt;

&lt;p&gt;Many chemical names, ranging from the simple to the complicated, were not be recognized at all by ChemNomParse. Some examples are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;benzene&lt;/li&gt;
&lt;li&gt;piperidine&lt;/li&gt;
&lt;li&gt;1-methoxyhexane&lt;/li&gt;
&lt;li&gt;2-methyl-5-prop-1-en-2-yl-cyclohex-2-en-1-one (carvone)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some names were incorrectly interpreted due to misassigned locants. For example, 2-chloro-3-hydroxybutanoic acid produced the incorrectly asssigned structure shown below:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/files/2_chloro_3_hydroxybutanoic_acid_error.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;ChemNomParse can accurately recognize chemical names representing simple substitutions on basic hydrocarbon scaffolds. More complicated structures, such as heterocycles, bicyclic systems, and systems involving nested substituents  do not appear to be handled at all. It is not clear to what extent these limitations reflect a small dictionary of morphemes (the basic nomenclature building blocks) versus deeper design issues.&lt;/p&gt;

&lt;p&gt;Despite its limitations, ChemNomParse is an interesting piece of open source software for working with chemical nomenclature. From this simple tutorial, it can be seen that nomenclature translation, when combined with other capabilities such as 2-D rendering, offers many exciting possibilities.&lt;/p&gt;</description>
      <pubDate>Mon, 11 Sep 2006 14:29:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:80a0c8f0-4c09-4d5d-9e00-229a835bc94a</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/11/visualizing-iupac-names-with-chemnomparse</link>
      <category>Graphics</category>
      <category>iupac</category>
      <category>javacc</category>
      <category>2d</category>
      <category>nomenclature</category>
      <category>translation</category>
    </item>
    <item>
      <title>Chemical Nomenclature Translation</title>
      <description>&lt;blockquote&gt;
    &lt;p&gt;... We report here the development of a computer program for converting chemical names into connection tables, a process we call "nomenclature translation." ... this process provides an alternate method of structure registration by allowing a new substance to be input &lt;em&gt;via&lt;/em&gt; a structurally descriptive systematic name instead of only as a connection table taken from a structural diagram.&lt;/p&gt;

    &lt;p&gt;&lt;cite&gt;-G.G.V. Stouw et al. &lt;a href="http://dx.doi.org/10.1021/c160055a009"&gt;J. Chem. Doc. 1974, 14, 185-193&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Systematic nomenclature is one of the oldest forms of &lt;a href="http://depth-first.com/articles/2006/08/18/107-years-of-line-formula-notations-1861-1968"&gt;line notation&lt;/a&gt;.  As a result, it can be found widely in papers, patents, spreadsheets, and other documents. Any software that can convert systematic nomenclature, such as IUPAC names, into a computer-based representational system, such as a connection table, has the potential to unlock vast amounts of &lt;a href="http://depth-first.com/articles/2006/09/03/peculiarities-of-chemical-information"&gt;legacy chemical information&lt;/a&gt; by making it structure-searchable.&lt;/p&gt;

&lt;p&gt;Stouw and his group at Chemical Abstracts Service (CAS) developed the first working system for name to structure conversion. Their interest in an automated process stemmed from the potential to greatly accelerate the rate at which the chemical literature could be indexed. Instead of a human creating a computer representation by manually parsing a systematic name from a paper, a computer could do it error-free at a fraction of the cost. These factors are still at work today, although the pool of raw chemical information material has increased exponentially since 1974.&lt;/p&gt;

&lt;p&gt;Nomenclature translation has been more widely investigated than the related problem of &lt;a href="http://depth-first.com/articles/2006/08/25/computational-perception-and-recognition-of-digitized-molecular-structures"&gt;2-D raster image interpretation&lt;/a&gt;, although the driving forces in both cases are the same. There are, of course, several proprietary packages for nomenclature translation. An important disadvantage of all of them is a distinct lack of customizability.&lt;/p&gt;

&lt;p&gt;Open source nomenclature translation options have been very limited. One of the first such packages was &lt;a href="http://chemnomparse.sourceforge.net/index.php"&gt;ChemNomParse&lt;/a&gt; by David Robinson, Bhupinder Sandhu, and Stephen Tomkinson at the University of Manchester. ChemNomParse has since been &lt;a href="http://cdk.sourceforge.net/api/org/openscience/cdk/iupac/parser/package-summary.html"&gt;made part of&lt;/a&gt; the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK). Although its capabilities are relatively limited, ChemNomParse is very useful for the design it embodies.&lt;/p&gt;

&lt;p&gt;More recently, &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/corbett/"&gt;Peter Corbet&lt;/a&gt; at Cambridge has developed a package called OPSIN. Egon Willighagen wrote about &lt;a href="http://chem-bla-ics.blogspot.com/2006/09/chemical-archeology-oscar3-to.html"&gt;integrating OPSIN&lt;/a&gt; into the desktop software package &lt;a href="http://bioclipse.net/"&gt;Bioclipse&lt;/a&gt;. OPSIN's source can be found in the &lt;a href="http://svn.sourceforge.net/viewvc/oscar3-chem/trunk/src/uk/ac/cam/ch/wwmm/opsin/"&gt;project's SVN repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The most exciting potential for chemical nomenclature translation is realized when this capability is blended with other chemical informatics technologies. Future articles in this series will show how ChemNomParse and OPSIN can be used with other open source tools to create rich chemical informatics systems.&lt;/p&gt;</description>
      <pubDate>Sun, 10 Sep 2006 15:15:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:f6197b28-32af-46b8-88cc-13a6941c167f</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/10/chemical-nomenclature-translation</link>
      <category>Tools</category>
      <category>nomenclature</category>
      <category>longtail</category>
      <category>opsin</category>
      <category>chemnomparse</category>
      <category>iupac</category>
      <category>oldliterature</category>
    </item>
  </channel>
</rss>
