<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag opsin</title>
    <link>http://depth-first.com/articles/tag/opsin</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>JRuby for Cheminformatics: Parsing IUPAC Nomenclature with OPSIN</title>
      <description>&lt;p&gt;&lt;a href="http://ruby-lang.org"&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Recent articles have discussed the use of &lt;a href="http://depth-first.com/articles/tag/rubidium"&gt;JRuby for cheminformatics&lt;/a&gt;. We've seen how to &lt;a href="http://depth-first.com/articles/2007/10/09/jruby-for-cheminformatics-parsing-smiles-simply"&gt;parse SMILES strings&lt;/a&gt;, and &lt;a href="http://depth-first.com/articles/2007/10/10/jruby-for-cheminformatics-reading-and-writing-inchis-via-the-java-native-interface"&gt;read or write InChIs&lt;/a&gt;. In this article, we'll see how easy it is to parse IUPAC nomenclature from JRuby using Peter Corbett's &lt;a href="http://depth-first.com/articles/2006/10/14/decoding-iupac-names-with-opsin"&gt;OPSIN library&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Installation&lt;/h4&gt;

&lt;p&gt;After &lt;a href="http://depth-first.com/articles/2007/10/09/jruby-for-cheminformatics-parsing-smiles-simply"&gt;installing JRuby&lt;/a&gt;, simply &lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;download the OPSIN jarfile&lt;/a&gt; and copy it to your JRuby &lt;tt&gt;lib&lt;/tt&gt; directory. You're done.&lt;/p&gt;

&lt;h4&gt;A Simple Library&lt;/h4&gt;

&lt;p&gt;We can write a simple library to convert an IUPAC name into a CML document:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;jruby&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;uk.ac.cam.ch.wwmm.opsin.NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;IUPAC&lt;/span&gt;
  &lt;span class="attribute"&gt;@@nts&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;read_name&lt;/span&gt; &lt;span class="ident"&gt;name&lt;/span&gt;
    &lt;span class="ident"&gt;cml&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@@nts&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parse_to_cml&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;name&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="keyword"&gt;raise&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Could not parse '&lt;span class="expr"&gt;#{name}&lt;/span&gt;'.&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="keyword"&gt;unless&lt;/span&gt; &lt;span class="ident"&gt;cml&lt;/span&gt;

    &lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;to_xml&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;read_name&lt;/tt&gt; method accepts an iupac name as a string and returns a CML document as a string. If the input can't be parsed, an exception is raised.&lt;/p&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;We can test the library by saving it as a file called &lt;strong&gt;iupac.rb&lt;/strong&gt; and invoking &lt;tt&gt;jirb&lt;/tt&gt;:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'iupac'
=&gt; true
irb(main):002:0&gt; include IUPAC
=&gt; Object
irb(main):003:0&gt; read_name('4-iodobenzoic acid')
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This returns the XML shown below, which has been re-formatted for clarity:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;cml&lt;/span&gt; &lt;span class="attribute"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/schema&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;molecule&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;m1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atomArray&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a3&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;3&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a4&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;4&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a5&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;5&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a6&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;6&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a7&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a8&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;O&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a9&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;O&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a10&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;I&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;label&lt;/span&gt; &lt;span class="attribute"&gt;value&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atomArray&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bondArray&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a1 a2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a2 a3&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a3 a4&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a4 a5&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a5 a6&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a6 a1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a7 a1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a7 a8&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a7 a9&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;a10 a4&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;bondArray&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This simple Ruby library has parsed the name '4-iodobenzoic acid' and has returned a string containing the CML representation for the molecule. If we had wanted the &lt;tt&gt;read_name&lt;/tt&gt; method to return a traversable XML object model, we could have enabled that as well.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;One of the objections raised whenever the issue of "new" programming languages comes up, regardless of their merit, is the age-old refrain "Yeah, but where's the software?" With JRuby, we bypass this question altogether. We can leverage the full scope of the massive Java development effort over the last ten years, which includes several excellent cheminformatics libraries. With virtually no effort, we have a working cheminformatics platform based on a widely-used, versatile and dynamic object-oriented scripting language. Future articles will discuss extensions to this platform and some applications.&lt;/p&gt;</description>
      <pubDate>Fri, 12 Oct 2007 10:37:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:873940fd-7b22-4013-a61b-ef928eee1c8e</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/12/jruby-for-cheminformatics-parsing-iupac-nomenclature-with-opsin</link>
      <category>Tools</category>
      <category>opsin</category>
      <category>iupac</category>
      <category>nametostruct</category>
      <category>rubidium</category>
      <category>jruby</category>
      <category>ruby</category>
      <category>java</category>
    </item>
    <item>
      <title>Agile Chemical Informatics Development with CDK and Ruby: RCDK-0.3.0</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/cdk_logo.png" align="right"&gt;&lt;/img&gt;Ruby Chemistry Development Kit (RCDK) version 0.3.0 is now available from RubyForge. &lt;a href="http://depth-first.com/articles/2006/09/25/cdk-the-ruby-way-rcdk-0-2-0"&gt;RCDK&lt;/a&gt; enables the complete CDK API to be accessed from Ruby. This release adds support for &lt;a href="http://depth-first.com/articles/2006/10/17/from-iupac-nomenclature-to-2-d-structures-with-opsin"&gt;IUPAC nomenclature translation&lt;/a&gt;  and &lt;a href="http://depth-first.com/articles/2006/10/24/metaprogramming-with-ruby-mapping-java-packages-onto-ruby-modules"&gt;tighter Java integration&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Dependencies&lt;/h4&gt;

&lt;p&gt;RCDK requires Ruby, the Ruby developer libraries, a working build toolchain, and &lt;a href="http://rjb.rubyforge.org"&gt;Ruby Java Bridge&lt;/a&gt; (RJB). This latter dependency can be satisfied during the RCDK installation process if the RubyGems method is used (see 'Installation').&lt;/p&gt;

&lt;h4&gt;Installation&lt;/h4&gt;

&lt;p&gt;RCDK can be conveniently installed using the &lt;a href="http://rubygems.org/"&gt;RubyGems&lt;/a&gt; packaging mechanism:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
# gem install rcdk
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Alternatively, the source package and RubyGem can be downloaded &lt;a href="http://rubyforge.org/frs/?group_id=2199"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Tighter Java Integration&lt;/h4&gt;

&lt;p&gt;RCDK-0.3.0 introduces a previously-described &lt;a href="http://depth-first.com/articles/2006/10/24/metaprogramming-with-ruby-mapping-java-packages-onto-ruby-modules"&gt;Java package to Ruby module mapping mechanism&lt;/a&gt;. For example, if you'd like to create a Java &lt;tt&gt;ArrayList&lt;/tt&gt;, it can be done through the new &lt;tt&gt;jrequire&lt;/tt&gt; command:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;jrequire&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java.util.ArrayList&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;list&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Java&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;ArrayList&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

&lt;span class="ident"&gt;list&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;size&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; 0&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;IUPAC Nomenclature Translation&lt;/h4&gt;

&lt;p&gt;RCDK's most important new chemical informatics feature is made possible by &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/corbett/"&gt;Peter Corbett's&lt;/a&gt; excellent IUPAC nomenclature translation library &lt;a href="http://depth-first.com/articles/2006/10/17/from-iupac-nomenclature-to-2-d-structures-with-opsin"&gt;OPSIN&lt;/a&gt;. It can either be used directly with &lt;tt&gt;jrequire&lt;/tt&gt;, or indirectly through RCDK's convenience library &lt;tt&gt;RCDK::Util&lt;/tt&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk/util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Lang&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read_iupac&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;quinoline&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getAtomCount&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; 10&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;There are two things to notice here. First, no &lt;tt&gt;jrequire&lt;/tt&gt; statement is needed when using the &lt;tt&gt;RCDK::Util&lt;/tt&gt; library. Second, there is a multisecond delay after &lt;tt&gt;read_iupac&lt;/tt&gt; is invoked. OPSIN itself introduces this delay during the &lt;tt&gt;NameToStructure&lt;/tt&gt; constructor call, and RCDK inherits this behavior. However, after the first invocation of &lt;tt&gt;read_iupac&lt;/tt&gt;, subsequent calls to this method are very fast.&lt;/p&gt;

&lt;p&gt;Let's decorate the quinoline nucleus with some substituents and render a 2-D image of the result. Execute the following code, either through the Ruby interpreter (&lt;tt&gt;ruby&lt;/tt&gt;) or through Interactive Ruby (&lt;tt&gt;irb&lt;/tt&gt;):&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk/util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Image&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;iupac_to_png&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;3-chloro-4-(2-aminopropyl)-6-mercapto-8-(2-hydroxyphenyl)-quinoline-2-carboxylic acid&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;test.png&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Running this code produces the following image in your working directory:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20061030/test.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h4&gt;Be Agile&lt;/h4&gt;

&lt;p&gt;RCDK marries the &lt;a href="http://www.martinfowler.com/articles/newMethodology.html"&gt;agility&lt;/a&gt; of the Ruby language with the functionality of three Open Source chemical informatics libraries: &lt;a href="http://cdk.sf.net"&gt;CDK&lt;/a&gt;; &lt;a href="http://depth-first.com/articles/2006/10/14/decoding-iupac-names-with-opsin"&gt;OPSIN&lt;/a&gt;; and &lt;a href="http://depth-first.com/articles/2006/08/28/drawing-2-d-structures-with-structure-cdk"&gt;Structure-CDK&lt;/a&gt;. Future articles will discuss some simple applications of this powerful combination.&lt;/p&gt;</description>
      <pubDate>Mon, 30 Oct 2006 14:03:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:a5e77e1e-7c24-47e4-90e9-ed1e068b19c2</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0</link>
      <category>Tools</category>
      <category>rcdk</category>
      <category>ruby</category>
      <category>cdk</category>
      <category>opsin</category>
      <category>agile</category>
      <category>java</category>
      <category>integration</category>
    </item>
    <item>
      <title>From IUPAC Nomenclature to 2-D Structures With OPSIN</title>
      <description>&lt;p&gt;A &lt;a href="http://depth-first.com/articles/2006/10/14/decoding-iupac-names-with-opsin"&gt;previous article&lt;/a&gt; introduced OPSIN, an Open Source Java library for decoding IUPAC chemical nomenclature. In this tutorial, you'll see how OPSIN can, when interfaced with freely-available chemical informatics software, generate 2-D structure diagrams from IUPAC names.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;This tutorial requires &lt;a href="http://depth-first.com/articles/2006/09/25/cdk-the-ruby-way-rcdk-0-2-0"&gt;Ruby CDK&lt;/a&gt; (RCDK), which in turn requires Ruby, Java, and the &lt;a href="http://rjb.rubyforge.org"&gt;Ruby Java Bridge&lt;/a&gt;. Tutorials detailing the installation of RCDK on both &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;Windows&lt;/a&gt; and &lt;a href="http://depth-first.com/articles/2006/09/25/cdk-the-ruby-way-rcdk-0-2-"&gt;Linux&lt;/a&gt; platforms are available.&lt;/p&gt;

&lt;p&gt;In addition, you'll need a copy of the standalone jarfile &lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;opsin-big-0.1.0.jar&lt;/a&gt;. Future versions of RCDK will integrate the OPSIN jarfile, making this step unnecessary.&lt;/p&gt;

&lt;h4&gt;Outlining the Problem and a Solution&lt;/h4&gt;

&lt;p&gt;We'd like to create a simple Ruby class with a method that accepts an IUPAC chemical name as input and produces a PNG image of the corresponding molecule as output. OPSIN accepts IUPAC names as input, but it only produces &lt;a href="http://www.xml-cml.org/"&gt;Chemical Markup Language&lt;/a&gt; (CML) as output. The CML output lacks 2-D coordinates, and OPSIN itself has no 2-D rendering capabilities.&lt;/p&gt;

&lt;p&gt;We'll use RCDK to augment OPSIN's capabilities. Thanks to CDK's built-in CML support, RCDK can read CML and generate an &lt;tt&gt;AtomContainer&lt;/tt&gt; representation. RCDK also supports the assignment of 2-D coordinates to an &lt;tt&gt;AtomContainer&lt;/tt&gt; via CDK's &lt;tt&gt;StructureDiagramGenerator&lt;/tt&gt;. To produce the PNG image, we'll use the 2-D rendering capability made possible through &lt;a href="http://depth-first.com/articles/2006/08/28/drawing-2-d-structures-with-structure-cdk"&gt;Structure-CDK&lt;/a&gt;, which is a built-in component of RCDK.&lt;/p&gt;

&lt;h4&gt;A Simple Ruby Library&lt;/h4&gt;

&lt;p&gt;Create a working directory and copy &lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;opsin-big-0.1.0.jar&lt;/a&gt; into it. Next, create a file called &lt;strong&gt;depictor.rb&lt;/strong&gt; containing the following Ruby code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;Java&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Classpath&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;add&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;opsin-big-0.1.0.jar&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;

&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="comment"&gt;# A simple IUPAC-&amp;gt;2-D structure convertor.&lt;/span&gt;
&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Depictor&lt;/span&gt;
  &lt;span class="attribute"&gt;@@StringReader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java.io.StringReader&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="attribute"&gt;@@NameToStructure&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;uk.ac.cam.ch.wwmm.opsin.NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="attribute"&gt;@@CMLReader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.io.CMLReader&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
  &lt;span class="attribute"&gt;@@ChemFile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.ChemFile&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@nts&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@@NameToStructure&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@@CMLReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Writes a &amp;lt;tt&amp;gt;width&amp;lt;/tt&amp;gt; by &amp;lt;tt&amp;gt;height&amp;lt;/tt&amp;gt; PNG to&lt;/span&gt;
  &lt;span class="comment"&gt;# &amp;lt;tt&amp;gt;filename&amp;lt;/tt&amp;gt; for the molecule described by&lt;/span&gt;
  &lt;span class="comment"&gt;# &amp;lt;tt&amp;gt;iupac_name&amp;lt;/tt&amp;gt;.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;depict_png&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;iupac_name&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;cml&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@nts&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parseToCML&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;iupac_name&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;throw&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Can't parse name: &lt;span class="expr"&gt;#{iupac_name}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt; &lt;span class="keyword"&gt;unless&lt;/span&gt; &lt;span class="ident"&gt;cml&lt;/span&gt;

    &lt;span class="ident"&gt;molfile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;cml_to_molfile&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Image&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;molfile_to_png&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;molfile&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="ident"&gt;private&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;cml_to_molfile&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;string_reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;StringReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;toXML&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;setReader&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;string_reader&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;chem_file&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@cml_reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@@ChemFile&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;molecule&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;chem_file&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getChemSequence&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;getChemModel&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;getSetOfMolecules&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getMolecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;molecule&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;XY&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;coordinate_molecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Lang&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get_molfile&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Testing, Testing&lt;/h4&gt;

&lt;p&gt;A short test will demonstrate the capabilities of the &lt;tt&gt;Depictor&lt;/tt&gt; library. Add the following to a file called &lt;strong&gt;test.rb&lt;/strong&gt; in your working directory (or enter it interactively with irb):&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;depictor&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;depictor&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
&lt;span class="ident"&gt;name&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;3,3-dimethyl-7-oxo-6-[(2-phenylacetyl)amino]-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylic acid&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="comment"&gt;#Penicillin G&lt;/span&gt;

&lt;span class="ident"&gt;depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;depict_png&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;name&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;out.png&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Running this test produces a 300x300 PNG image of Penicillin G, named &lt;strong&gt;out.png&lt;/strong&gt;, in your working directory:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20061017/out.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;As you can see, this simple library and test code has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;correctly parsed the rather complex IUPAC name (3,3-dimethyl-7-oxo-6-[(2-phenylacetyl)amino]-4-thia-1-azabicyclo[3.2.0]heptane-2- carboxylic acid) to a valid CML representation&lt;/li&gt;
&lt;li&gt;converted this representation to a CDK &lt;tt&gt;AtomContainer&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;assigned 2-D coordinates&lt;/li&gt;
&lt;li&gt;rendered a PNG image in color&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice how the thiaazabicyclo[3.2.0] system, complete with properly-placed substitutents, was flawlessly identified and parsed.&lt;/p&gt;

&lt;p&gt;If you entered the above test code interactively via IRB, you may have noticed a multi-second delay in instantiating &lt;tt&gt;Depictor&lt;/tt&gt;. This latency results from a sluggish &lt;tt&gt;NameToStructure&lt;/tt&gt; constructor in OPSIN. A similar delay also occurs in OPSIN's pure-Java unit tests. Once &lt;tt&gt;Depictor&lt;/tt&gt; is instantiated, however, image generation occurs relatively quickly.&lt;/p&gt;

&lt;p&gt;The unususal orientation of the beta-lactam carbonyl group is determined by CDK's &lt;tt&gt;StructureDiagramGenerator&lt;/tt&gt;. The source of this behavior will be explored in a future article.&lt;/p&gt;

&lt;h4&gt;More Examples&lt;/h4&gt;

&lt;p&gt;To illustrate some of the capabilities of the OPSIN-RCDK combination, a few more examples are provided below.&lt;/p&gt;

&lt;p&gt;One of OPSIN's more surprising features is how well it handles heterocycles. For example, the IUPAC name for caffeine (&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2519"&gt;1,3,7-trimethylpurine-2,6-dione&lt;/a&gt;) is translated to:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/caffeine.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;As another example, consider the tetrazole (&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=180603"&gt;1-[2-hydroxy-3-propyl-4-[3-(2H-tetrazol-5-yl)propoxy]phenyl]ethanone&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/180603.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Highly substituted benzene rings and carboxylic acids are also translated accurately, as in &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2528"&gt;3-acetamido-5-(acetyl-methyl-amino)-2,4,6-triiodo-benzoic acid&lt;/a&gt; (Metrizoate):&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/metrizoate.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;How about a hairy-looking macrocycle name with multiple levels of morpheme nesting (&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2547"&gt;3,6-diamino-N-[[15-amino-11-(2-amino-3,4,5,6-tetrahydropyrimidin-4-yl)-8- [(carbamoylamino)methylidene]-2-(hydroxymethyl)-3,6,9,12,16-pentaoxo- 1,4,7,10,13-pentazacyclohexadec-5-yl]methyl]hexanamide&lt;/a&gt;)? Not a problem:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/2547.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h4&gt;Limitations&lt;/h4&gt;

&lt;p&gt;In my tests of the OPSIN library, one structure appeared to be incorrectly parsed - &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=180591"&gt;N-(5-chloro-2-methyl-phenyl)-2-methoxy-N-(2-oxooxazolidin-3-yl)acetamide&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/180591.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;There are actually two problems with the output. First, an oxygen atom and a methyl group are overlapping near the top of the diargram. This cosmetic issue is related to CDK's &lt;tt&gt;StructureDiagramGenerator&lt;/tt&gt;. Second, the oxazolidine nitrogen atom is misplaced by OPSIN. The correct 2-D image of this molecule, obtained from PubChem, is shown below:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
&lt;img src="http://depth-first.com/demo/20061017/180591_pc.png"&gt;&lt;/img&gt;
&lt;/center&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;It's not common to find an early-development Open Source project with the sophistication of OPSIN. The smooth handling of nested morphemes, aromatic heterocycles, macrocycles, and a good fraction of what I threw at it leads me to belive that a well-designed and extensible nomenclature parsing engine lies at OPSIN's core. More on that later, though.&lt;/p&gt;

&lt;p&gt;What could you do with a powerful Open Source IUPAC nomenclature parser? The answer to that one question could fill a three-volume series. Suffice it to say that OPSIN, in combination with other Open Source software, offers virtually limitless potential for indexing, collecting, repackaging, reprocessing, and mashing up vast amounts of chemical information. Because of its Open Source license, OPSIN can be extended and otherwise modified to fit your particular needs. Future articles will highlight some of the possibilities.&lt;/p&gt;</description>
      <pubDate>Tue, 17 Oct 2006 13:57:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:fd6de2ae-23c8-4e50-9765-344e9a7a9545</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/10/17/from-iupac-nomenclature-to-2-d-structures-with-opsin</link>
      <category>Graphics</category>
      <category>opsin</category>
      <category>nametostruct</category>
      <category>iupac</category>
      <category>rcdk</category>
      <category>structure</category>
      <category>cdk</category>
      <category>integration</category>
      <category>mashup</category>
    </item>
    <item>
      <title>Decoding IUPAC Names With OPSIN</title>
      <description>&lt;p&gt;IUPAC chemical nomenclature is everywhere. It can be found in journal articles, both new and old, on the Web, in databases, on Material Safety Data Sheets (MSDS), in chemical catalogs, and just about anywhere chemical information is found. The rules of this nomenclature are one of the first things taught in Organic Chemistry classes, and entire books are devoted to the subject. Although software for IUPAC nomenclature translation has been researched &lt;a href="http://depth-first.com/articles/2006/09/10/chemical-nomenclature-translation"&gt;since the 1970s&lt;/a&gt;, it has only become widespread within the last ten years. As is typical, IUPAC nomenclature developer toolkits are closed, proprietary, very expensive, and not customizable - &lt;a href="http://depth-first.com/articles/2006/09/11/visualizing-iupac-names-with-chemnomparse"&gt;with one notable exception&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A little software package called OPSIN may be set to change this. Read on to see how you can use OPSIN to begin programatically decoding IUPAC chemical nomenclature today.&lt;/p&gt;

&lt;h4&gt;Meet OPSIN&lt;/h4&gt;

&lt;p&gt;OPSIN is an Open Source Java library for parsing IUPAC nomenclature. Despite its early development status, OPSIN can decode a variety of difficult features in basic IUPAC nomenclature, including bicyclo systems, nested substitution, saturated heterocycles, and a variety of arenes and heteroarenes. OPSIN currently doesn't handle stereochemistry, organometallics, or a variety of other advanced IUPAC nomenclature features.&lt;/p&gt;

&lt;h4&gt;Brief Background&lt;/h4&gt;

&lt;p&gt;OPSIN was written by &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/corbett/"&gt;Peter Corbett&lt;/a&gt; at the University of Cambridge. Until recently, OPSIN was an integral part of of the innovative chemical data checker &lt;a href="http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/ExperimentalDataChecker/index.asp"&gt;OSCAR&lt;/a&gt;. One of the exciting uses of OSCAR is in the &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=59"&gt;automated validation&lt;/a&gt; of experimental data.&lt;/p&gt;

&lt;h4&gt;Getting OPSIN&lt;/h4&gt;

&lt;p&gt;Recently, OPSIN was factored out of OSCAR. It can now be downloaded as two standalone packages from SourceForge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin_0.1.0.zip?download"&gt;Source Distribution&lt;/a&gt;: Contains the complete OPSIN source code, all library dependencies, all datasets, and an Ant build script.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;Jarfile&lt;/a&gt;: A standalone jarfile containing all library dependencies and data files.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;What OPSIN Does&lt;/h4&gt;

&lt;p&gt;OPSIN accepts an IUPAC name, encoded as a &lt;tt&gt;String&lt;/tt&gt; object, as input and provides a &lt;a href="http://www.xml-cml.org/"&gt;Chemical Markup Language&lt;/a&gt; (CML) document object model as output. The main point of entry into the library is the &lt;tt&gt;NameToStructure&lt;/tt&gt; class and its two overloaded &lt;tt&gt;parseToCML&lt;/tt&gt; methods.&lt;/p&gt;

&lt;p&gt;OPSIN's output is the root node in a &lt;a href="http://www.xom.nu/"&gt;XOM&lt;/a&gt; XML &lt;tt&gt;Element&lt;/tt&gt; hierarchy. XOM's &lt;tt&gt;Element&lt;/tt&gt; class provides a convenience method, &lt;tt&gt;toXML&lt;/tt&gt; that conveniently prints the text-based XML representation for itself and all &lt;tt&gt;Elements&lt;/tt&gt; below it.&lt;/p&gt;

&lt;p&gt;Because its output is pure XML, OPSIN does not depend on any chemical informatics toolkit to do its job. This makes OPSIN ideal for use within larger chemical informatics systems. Provided your software can interpret CML, you should be able to manipulate OPSIN's output in a variety of useful ways.&lt;/p&gt;

&lt;h4&gt;What's Next?&lt;/h4&gt;

&lt;p&gt;Future articles will discuss OPSIN's capabilities and limitations in more detail. As has become customary for Depth-First's tutorials, &lt;a href="http://ruby-lang.org"&gt;Ruby&lt;/a&gt; and the excellent &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;Ruby Java Bridge&lt;/a&gt; will be used to illustrate the important points.&lt;/p&gt;</description>
      <pubDate>Sat, 14 Oct 2006 14:39:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:2550a5cb-baf7-419b-af18-338272f3bb59</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/10/14/decoding-iupac-names-with-opsin</link>
      <category>Tools</category>
      <category>opsin</category>
      <category>nametostruct</category>
      <category>iupac</category>
      <category>oscar</category>
      <category>xom</category>
      <category>cml</category>
    </item>
    <item>
      <title>Chemical Nomenclature Translation</title>
      <description>&lt;blockquote&gt;
    &lt;p&gt;... We report here the development of a computer program for converting chemical names into connection tables, a process we call "nomenclature translation." ... this process provides an alternate method of structure registration by allowing a new substance to be input &lt;em&gt;via&lt;/em&gt; a structurally descriptive systematic name instead of only as a connection table taken from a structural diagram.&lt;/p&gt;

    &lt;p&gt;&lt;cite&gt;-G.G.V. Stouw et al. &lt;a href="http://dx.doi.org/10.1021/c160055a009"&gt;J. Chem. Doc. 1974, 14, 185-193&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Systematic nomenclature is one of the oldest forms of &lt;a href="http://depth-first.com/articles/2006/08/18/107-years-of-line-formula-notations-1861-1968"&gt;line notation&lt;/a&gt;.  As a result, it can be found widely in papers, patents, spreadsheets, and other documents. Any software that can convert systematic nomenclature, such as IUPAC names, into a computer-based representational system, such as a connection table, has the potential to unlock vast amounts of &lt;a href="http://depth-first.com/articles/2006/09/03/peculiarities-of-chemical-information"&gt;legacy chemical information&lt;/a&gt; by making it structure-searchable.&lt;/p&gt;

&lt;p&gt;Stouw and his group at Chemical Abstracts Service (CAS) developed the first working system for name to structure conversion. Their interest in an automated process stemmed from the potential to greatly accelerate the rate at which the chemical literature could be indexed. Instead of a human creating a computer representation by manually parsing a systematic name from a paper, a computer could do it error-free at a fraction of the cost. These factors are still at work today, although the pool of raw chemical information material has increased exponentially since 1974.&lt;/p&gt;

&lt;p&gt;Nomenclature translation has been more widely investigated than the related problem of &lt;a href="http://depth-first.com/articles/2006/08/25/computational-perception-and-recognition-of-digitized-molecular-structures"&gt;2-D raster image interpretation&lt;/a&gt;, although the driving forces in both cases are the same. There are, of course, several proprietary packages for nomenclature translation. An important disadvantage of all of them is a distinct lack of customizability.&lt;/p&gt;

&lt;p&gt;Open source nomenclature translation options have been very limited. One of the first such packages was &lt;a href="http://chemnomparse.sourceforge.net/index.php"&gt;ChemNomParse&lt;/a&gt; by David Robinson, Bhupinder Sandhu, and Stephen Tomkinson at the University of Manchester. ChemNomParse has since been &lt;a href="http://cdk.sourceforge.net/api/org/openscience/cdk/iupac/parser/package-summary.html"&gt;made part of&lt;/a&gt; the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK). Although its capabilities are relatively limited, ChemNomParse is very useful for the design it embodies.&lt;/p&gt;

&lt;p&gt;More recently, &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/corbett/"&gt;Peter Corbet&lt;/a&gt; at Cambridge has developed a package called OPSIN. Egon Willighagen wrote about &lt;a href="http://chem-bla-ics.blogspot.com/2006/09/chemical-archeology-oscar3-to.html"&gt;integrating OPSIN&lt;/a&gt; into the desktop software package &lt;a href="http://bioclipse.net/"&gt;Bioclipse&lt;/a&gt;. OPSIN's source can be found in the &lt;a href="http://svn.sourceforge.net/viewvc/oscar3-chem/trunk/src/uk/ac/cam/ch/wwmm/opsin/"&gt;project's SVN repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The most exciting potential for chemical nomenclature translation is realized when this capability is blended with other chemical informatics technologies. Future articles in this series will show how ChemNomParse and OPSIN can be used with other open source tools to create rich chemical informatics systems.&lt;/p&gt;</description>
      <pubDate>Sun, 10 Sep 2006 15:15:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:f6197b28-32af-46b8-88cc-13a6941c167f</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/10/chemical-nomenclature-translation</link>
      <category>Tools</category>
      <category>nomenclature</category>
      <category>longtail</category>
      <category>opsin</category>
      <category>chemnomparse</category>
      <category>iupac</category>
      <category>oldliterature</category>
    </item>
  </channel>
</rss>
