<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Decoding IUPAC Names With OPSIN</title>
    <link>http://depth-first.com/articles/2006/10/14/decoding-iupac-names-with-opsin</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Decoding IUPAC Names With OPSIN</title>
      <description>&lt;p&gt;IUPAC chemical nomenclature is everywhere. It can be found in journal articles, both new and old, on the Web, in databases, on Material Safety Data Sheets (MSDS), in chemical catalogs, and just about anywhere chemical information is found. The rules of this nomenclature are one of the first things taught in Organic Chemistry classes, and entire books are devoted to the subject. Although software for IUPAC nomenclature translation has been researched &lt;a href="http://depth-first.com/articles/2006/09/10/chemical-nomenclature-translation"&gt;since the 1970s&lt;/a&gt;, it has only become widespread within the last ten years. As is typical, IUPAC nomenclature developer toolkits are closed, proprietary, very expensive, and not customizable - &lt;a href="http://depth-first.com/articles/2006/09/11/visualizing-iupac-names-with-chemnomparse"&gt;with one notable exception&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A little software package called OPSIN may be set to change this. Read on to see how you can use OPSIN to begin programatically decoding IUPAC chemical nomenclature today.&lt;/p&gt;

&lt;h4&gt;Meet OPSIN&lt;/h4&gt;

&lt;p&gt;OPSIN is an Open Source Java library for parsing IUPAC nomenclature. Despite its early development status, OPSIN can decode a variety of difficult features in basic IUPAC nomenclature, including bicyclo systems, nested substitution, saturated heterocycles, and a variety of arenes and heteroarenes. OPSIN currently doesn't handle stereochemistry, organometallics, or a variety of other advanced IUPAC nomenclature features.&lt;/p&gt;

&lt;h4&gt;Brief Background&lt;/h4&gt;

&lt;p&gt;OPSIN was written by &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/corbett/"&gt;Peter Corbett&lt;/a&gt; at the University of Cambridge. Until recently, OPSIN was an integral part of of the innovative chemical data checker &lt;a href="http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/ExperimentalDataChecker/index.asp"&gt;OSCAR&lt;/a&gt;. One of the exciting uses of OSCAR is in the &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=59"&gt;automated validation&lt;/a&gt; of experimental data.&lt;/p&gt;

&lt;h4&gt;Getting OPSIN&lt;/h4&gt;

&lt;p&gt;Recently, OPSIN was factored out of OSCAR. It can now be downloaded as two standalone packages from SourceForge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin_0.1.0.zip?download"&gt;Source Distribution&lt;/a&gt;: Contains the complete OPSIN source code, all library dependencies, all datasets, and an Ant build script.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;Jarfile&lt;/a&gt;: A standalone jarfile containing all library dependencies and data files.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;What OPSIN Does&lt;/h4&gt;

&lt;p&gt;OPSIN accepts an IUPAC name, encoded as a &lt;tt&gt;String&lt;/tt&gt; object, as input and provides a &lt;a href="http://www.xml-cml.org/"&gt;Chemical Markup Language&lt;/a&gt; (CML) document object model as output. The main point of entry into the library is the &lt;tt&gt;NameToStructure&lt;/tt&gt; class and its two overloaded &lt;tt&gt;parseToCML&lt;/tt&gt; methods.&lt;/p&gt;

&lt;p&gt;OPSIN's output is the root node in a &lt;a href="http://www.xom.nu/"&gt;XOM&lt;/a&gt; XML &lt;tt&gt;Element&lt;/tt&gt; hierarchy. XOM's &lt;tt&gt;Element&lt;/tt&gt; class provides a convenience method, &lt;tt&gt;toXML&lt;/tt&gt; that conveniently prints the text-based XML representation for itself and all &lt;tt&gt;Elements&lt;/tt&gt; below it.&lt;/p&gt;

&lt;p&gt;Because its output is pure XML, OPSIN does not depend on any chemical informatics toolkit to do its job. This makes OPSIN ideal for use within larger chemical informatics systems. Provided your software can interpret CML, you should be able to manipulate OPSIN's output in a variety of useful ways.&lt;/p&gt;

&lt;h4&gt;What's Next?&lt;/h4&gt;

&lt;p&gt;Future articles will discuss OPSIN's capabilities and limitations in more detail. As has become customary for Depth-First's tutorials, &lt;a href="http://ruby-lang.org"&gt;Ruby&lt;/a&gt; and the excellent &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;Ruby Java Bridge&lt;/a&gt; will be used to illustrate the important points.&lt;/p&gt;</description>
      <pubDate>Sat, 14 Oct 2006 14:39:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:2550a5cb-baf7-419b-af18-338272f3bb59</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/10/14/decoding-iupac-names-with-opsin</link>
      <category>Tools</category>
      <category>opsin</category>
      <category>nametostruct</category>
      <category>iupac</category>
      <category>oscar</category>
      <category>xom</category>
      <category>cml</category>
    </item>
  </channel>
</rss>
