<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag nomenclature</title>
    <link>http://depth-first.com/articles/tag/nomenclature</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Visualizing IUPAC Names with ChemNomParse</title>
      <description>&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/09/10/chemical-nomenclature-translation"&gt;Nomenclature translation&lt;/a&gt; is the process of converting a human-readable chemical name into a machine-readable notational scheme such as a connection table. It plays a key role in linking the &lt;a href="http://depth-first.com/articles/2006/09/03/peculiarities-of-chemical-information"&gt;older chemical literature&lt;/a&gt; to modern information technologies, such as the Internet.&lt;/p&gt;

&lt;p&gt;Buried deep within the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK) is a library for nomenclature translation called &lt;a href="http://chemnomparse.sourceforge.net/"&gt;ChemNomParse&lt;/a&gt;. At the heart of ChemNomParse is a remarkable piece of software called the &lt;a href="https://javacc.dev.java.net/"&gt;Java Compiler Compiler&lt;/a&gt; (JavaCC), a parser generator and lexical analyzer generator for Java. A FAQ on JavaCC is available &lt;a href="http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-moz.htm"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This tutorial demonstrates how freely-available, open source tools can be used to parse an IUPAC chemical name and generate its corresponding 2-D structure rendering. A &lt;a href="http://depth-first.com/articles/2006/09/02/humanizing-line-notations"&gt;closely-related tutorial&lt;/a&gt; on generating 2-D structures from SMILES strings may be helpful as background.&lt;/p&gt;

&lt;h4&gt;Ingredients&lt;/h4&gt;

&lt;p&gt;This tutorial uses Arton's &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt;, the installation and use of which has been outlined &lt;a href="http://depth-first.com/articles/2006/08/26/scripting-java-libraries-with-ruby-java-bridge"&gt;previously&lt;/a&gt;. In addition, you'll need to download &lt;a href="http://prdownloads.sourceforge.net/structure/structure-cdk-0.1.2.zip?download"&gt;Structure-CDK v0.1.2&lt;/a&gt;, also &lt;a href="http://depth-first.com/articles/2006/08/28/drawing-2-d-structures-with-structure-cdk"&gt;previously discussed&lt;/a&gt;. Be sure to download v0.1.2, as two upgrades have been released since the package was originally described. This tutorial has been tested on Mandriva Linux 2006.&lt;/p&gt;

&lt;p&gt;Create a working directory called &lt;strong&gt;nom&lt;/strong&gt;. From the &lt;strong&gt;lib&lt;/strong&gt; directory of the Structure-CDK distribution, copy &lt;strong&gt;cdk-20060714.jar&lt;/strong&gt; and &lt;strong&gt;structure-cdk-0.1.2.jar&lt;/strong&gt; into your &lt;strong&gt;depict&lt;/strong&gt; working directory.&lt;/p&gt;

&lt;h4&gt;Code&lt;/h4&gt;

&lt;p&gt;Create a file called &lt;strong&gt;depict.rb&lt;/strong&gt; and copy the following code into it:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="constant"&gt;ENV&lt;/span&gt;&lt;span class="punct"&gt;['&lt;/span&gt;&lt;span class="string"&gt;CLASSPATH&lt;/span&gt;&lt;span class="punct"&gt;']&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;./cdk-20060714.jar:./structure-cdk-0.1.2.jar&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;NomParser&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.iupac.parser.NomParser&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="constant"&gt;StructureDiagramGenerator&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.layout.StructureDiagramGenerator&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="constant"&gt;ImageKit&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net.sf.structure.cdk.util.ImageKit&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Depictor&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@sdg&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;StructureDiagramGenerator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;depict_png&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_png&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="constant"&gt;ImageKit&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;writePNG&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom_to_mol&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;),&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_png&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;depict_svg&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_svg&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="constant"&gt;ImageKit&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;writeSVG&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom_to_mol&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;),&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_svg&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

&lt;span class="ident"&gt;private&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;nom_to_mol&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="attribute"&gt;@sdg&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;setMolecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;NomParser&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;generate&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;))&lt;/span&gt;
    &lt;span class="attribute"&gt;@sdg&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;generateCoordinates&lt;/span&gt;

    &lt;span class="attribute"&gt;@sdg&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getMolecule&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

After you save this file, you'll need to set your &lt;tt&gt;LD_LIBRARY_PATH&lt;/tt&gt; on unix (or the equivalent on another OS):

&lt;div class="console"&gt;
&lt;pre&gt;
$ export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/i386:$LD_LIBRARY_PATH
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This tells RJB where to find Java's native libraries. Because of RJB's current design, &lt;tt&gt;LD_LIBRARY_PATH&lt;/tt&gt; needs to be set from the command line, rather than from within a Ruby process.&lt;/p&gt;

&lt;p&gt;Using the Depictor class is as simple as creating an instance and invoking &lt;tt&gt;depict_png&lt;/tt&gt; or &lt;tt&gt;depict_svg&lt;/tt&gt; on it:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;nom&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;depictor&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

&lt;span class="ident"&gt;depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;depict_png&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;2-phenylcyclohexan-1-ol&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;output.png&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Executing the above code either through the Ruby interpreter (ruby) or via Interactive Ruby (irb) products a PNG image of the chiral auxiliary shown below:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/files/phycy.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Other names correctly recognized by ChemNomParse include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;phenylhexyne&lt;/li&gt;
&lt;li&gt;2-chloro-3-phenyl-4,4-dimethylhexane&lt;/li&gt;
&lt;li&gt;3-phenyl-1-aminopropane&lt;/li&gt;
&lt;li&gt;1,2-difluoro-3-hydroxycyclohexene&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Limitations&lt;/h4&gt;

&lt;p&gt;Many chemical names, ranging from the simple to the complicated, were not be recognized at all by ChemNomParse. Some examples are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;benzene&lt;/li&gt;
&lt;li&gt;piperidine&lt;/li&gt;
&lt;li&gt;1-methoxyhexane&lt;/li&gt;
&lt;li&gt;2-methyl-5-prop-1-en-2-yl-cyclohex-2-en-1-one (carvone)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some names were incorrectly interpreted due to misassigned locants. For example, 2-chloro-3-hydroxybutanoic acid produced the incorrectly asssigned structure shown below:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/files/2_chloro_3_hydroxybutanoic_acid_error.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;ChemNomParse can accurately recognize chemical names representing simple substitutions on basic hydrocarbon scaffolds. More complicated structures, such as heterocycles, bicyclic systems, and systems involving nested substituents  do not appear to be handled at all. It is not clear to what extent these limitations reflect a small dictionary of morphemes (the basic nomenclature building blocks) versus deeper design issues.&lt;/p&gt;

&lt;p&gt;Despite its limitations, ChemNomParse is an interesting piece of open source software for working with chemical nomenclature. From this simple tutorial, it can be seen that nomenclature translation, when combined with other capabilities such as 2-D rendering, offers many exciting possibilities.&lt;/p&gt;</description>
      <pubDate>Mon, 11 Sep 2006 14:29:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:80a0c8f0-4c09-4d5d-9e00-229a835bc94a</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/11/visualizing-iupac-names-with-chemnomparse</link>
      <category>Graphics</category>
      <category>iupac</category>
      <category>javacc</category>
      <category>2d</category>
      <category>nomenclature</category>
      <category>translation</category>
    </item>
    <item>
      <title>Chemical Nomenclature Translation</title>
      <description>&lt;blockquote&gt;
    &lt;p&gt;... We report here the development of a computer program for converting chemical names into connection tables, a process we call "nomenclature translation." ... this process provides an alternate method of structure registration by allowing a new substance to be input &lt;em&gt;via&lt;/em&gt; a structurally descriptive systematic name instead of only as a connection table taken from a structural diagram.&lt;/p&gt;

    &lt;p&gt;&lt;cite&gt;-G.G.V. Stouw et al. &lt;a href="http://dx.doi.org/10.1021/c160055a009"&gt;J. Chem. Doc. 1974, 14, 185-193&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Systematic nomenclature is one of the oldest forms of &lt;a href="http://depth-first.com/articles/2006/08/18/107-years-of-line-formula-notations-1861-1968"&gt;line notation&lt;/a&gt;.  As a result, it can be found widely in papers, patents, spreadsheets, and other documents. Any software that can convert systematic nomenclature, such as IUPAC names, into a computer-based representational system, such as a connection table, has the potential to unlock vast amounts of &lt;a href="http://depth-first.com/articles/2006/09/03/peculiarities-of-chemical-information"&gt;legacy chemical information&lt;/a&gt; by making it structure-searchable.&lt;/p&gt;

&lt;p&gt;Stouw and his group at Chemical Abstracts Service (CAS) developed the first working system for name to structure conversion. Their interest in an automated process stemmed from the potential to greatly accelerate the rate at which the chemical literature could be indexed. Instead of a human creating a computer representation by manually parsing a systematic name from a paper, a computer could do it error-free at a fraction of the cost. These factors are still at work today, although the pool of raw chemical information material has increased exponentially since 1974.&lt;/p&gt;

&lt;p&gt;Nomenclature translation has been more widely investigated than the related problem of &lt;a href="http://depth-first.com/articles/2006/08/25/computational-perception-and-recognition-of-digitized-molecular-structures"&gt;2-D raster image interpretation&lt;/a&gt;, although the driving forces in both cases are the same. There are, of course, several proprietary packages for nomenclature translation. An important disadvantage of all of them is a distinct lack of customizability.&lt;/p&gt;

&lt;p&gt;Open source nomenclature translation options have been very limited. One of the first such packages was &lt;a href="http://chemnomparse.sourceforge.net/index.php"&gt;ChemNomParse&lt;/a&gt; by David Robinson, Bhupinder Sandhu, and Stephen Tomkinson at the University of Manchester. ChemNomParse has since been &lt;a href="http://cdk.sourceforge.net/api/org/openscience/cdk/iupac/parser/package-summary.html"&gt;made part of&lt;/a&gt; the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK). Although its capabilities are relatively limited, ChemNomParse is very useful for the design it embodies.&lt;/p&gt;

&lt;p&gt;More recently, &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/corbett/"&gt;Peter Corbet&lt;/a&gt; at Cambridge has developed a package called OPSIN. Egon Willighagen wrote about &lt;a href="http://chem-bla-ics.blogspot.com/2006/09/chemical-archeology-oscar3-to.html"&gt;integrating OPSIN&lt;/a&gt; into the desktop software package &lt;a href="http://bioclipse.net/"&gt;Bioclipse&lt;/a&gt;. OPSIN's source can be found in the &lt;a href="http://svn.sourceforge.net/viewvc/oscar3-chem/trunk/src/uk/ac/cam/ch/wwmm/opsin/"&gt;project's SVN repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The most exciting potential for chemical nomenclature translation is realized when this capability is blended with other chemical informatics technologies. Future articles in this series will show how ChemNomParse and OPSIN can be used with other open source tools to create rich chemical informatics systems.&lt;/p&gt;</description>
      <pubDate>Sun, 10 Sep 2006 15:15:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:f6197b28-32af-46b8-88cc-13a6941c167f</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/10/chemical-nomenclature-translation</link>
      <category>Tools</category>
      <category>nomenclature</category>
      <category>longtail</category>
      <category>opsin</category>
      <category>chemnomparse</category>
      <category>iupac</category>
      <category>oldliterature</category>
    </item>
  </channel>
</rss>
