<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag cml</title>
    <link>http://depth-first.com/articles/tag/cml</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Web 2.0 and Chemistry</title>
      <description>&lt;blockquote&gt;
    &lt;p&gt;Chemistry on the World Wide Web is picking up speed like a runaway train. A remarkable number of groups are devoting considerable time, effort, and money to a wide variety of chemical web applications.&lt;/p&gt;
    
    &lt;p&gt;-&lt;cite&gt;Stu Borman, &lt;a href="http://pubs.acs.org/hotartcl/cenear/960916/explode.html"&gt;Chemical &amp;amp; Engineering News, September 16, 1996&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Whatever happened to chemistry and the Web? Stu Borman's article is a wonderful read, if for no other reason than to illustrate what makes technology predictions so tricky. Borman cites these developments, among others, as evidence of the rise of chemistry on the Web circa 1996:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;An &lt;a href="http://altavista.com"&gt;AltaVista&lt;/a&gt; search for the word "chemistry" returned 400,000 documents. (Remember AltaVista? Google now lists over &lt;em&gt;115 million&lt;/em&gt; documents containing the word "chemistry").&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Popular Web browsers such as Navigator 3.0 and Explorer 3.0 support Greek characters, an important characteristic of chemical information. (Support for chemistry in Web browsers has barely improved in the meantime.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A Java browser for &lt;a href="http://www.xml-cml.org/"&gt;Chemical Markup Language&lt;/a&gt; (CML) was soon to be released. (CML is still in use and supported by many software packages, although it has not been widely adopted. For example, the builders of &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; opted for a custom XML format over CML.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.ch.ic.ac.uk/rzepa/vrml/"&gt;Virtual Reality Markup Language&lt;/a&gt; (VRML) could be used display molecules in three-dimensions. (&lt;a href="http://en.wikipedia.org/wiki/VRML"&gt;This article&lt;/a&gt; gives a brief overview of the rise and fall of VRML.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.mdl.com/products/framework/chime/"&gt;Chemscape Chime&lt;/a&gt;, a Navigator plugin that interprets chemical information, could be freely downloaded. (&lt;del&gt;MDL has apparently since discontinued free distribution of Chime.&lt;/del&gt; The &lt;a href="http://depth-first.com/articles/2006/09/27/hacking-pubchem-free-speech-or-free-beer"&gt;free as in beer&lt;/a&gt; plugin is &lt;a href="http://www.orgsyn.org/"&gt;rarely&lt;/a&gt; seen in use on the public Web.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chemically oriented Java applets such as "WebSketch" are proliferating. (Java has had a difficult time making it as a browser technology. Security has had little to do with it, though.)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stu Borman's article serves as a clear reminder that chemistry on the Web is almost as awkward today as it was at the dawn of the Internet age. The problem isn't lack of content; it's the lack of robust, widely-adopted, open standards that take the &lt;a href="http://depth-first.com/articles/2006/09/03/peculiarities-of-chemical-information"&gt;peculiarities of chemical information&lt;/a&gt; into account, and the free software to support them. Coming to terms with past failures in this area is one way to increase the chances of future success.&lt;/p&gt;</description>
      <pubDate>Mon, 12 Mar 2007 10:23:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:8c8362ba-993e-438a-bb75-8d2627aeefe6</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/12/web-2-0-and-chemistry</link>
      <category>Meta</category>
      <category>web20</category>
      <category>opensource</category>
      <category>openstandards</category>
      <category>cml</category>
      <category>java</category>
      <category>web</category>
    </item>
    <item>
      <title>An Object-Oriented Framework for Molecular Representation: Getting Started with Octet</title>
      <description>&lt;p&gt;&lt;a href="http://www.amazon.com/gp/product/0201633612?ie=UTF8&amp;amp;tag=depthfirst-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0201633612"&gt;&lt;img border="0" src="http://depth-first.com/files/design_patterns.jpg" align="right"&gt;&lt;/a&gt;&lt;img src="http://www.assoc-amazon.com/e/ir?t=depthfirst-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=0201633612" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /&gt;&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;If applications are hard to design and toolkits are harder, then frameworks are hardest of all. A framework designer gambles that one architecture will work for all applications in the domain. Any substantive change to the framework's design would reduce its benefits considerably, since the framework's main contribution to an application is the architecture it defines. Therefore it's imperative to design the framework to be as flexible and extensible as possible.&lt;/p&gt;

    &lt;p&gt;-&lt;cite&gt;Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides- &lt;em&gt;&lt;a href="http://www.amazon.com/gp/product/0201633612?ie=UTF8&amp;amp;tag=depthfirst-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0201633612"&gt;Design Patterns&lt;/a&gt;&lt;img src="http://www.assoc-amazon.com/e/ir?t=depthfirst-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=0201633612" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /&gt;&lt;/em&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One of the most important considerations when building an application is the choice of framework. As the quote from the &lt;a href="http://en.wikipedia.org/wiki/Design_patterns"&gt;Gang of Four&lt;/a&gt; implies, there's much more to frameworks than just a collection of re-usable code. At their best, frameworks provide a foundation for thinking about a problem domain and a language for communicating with other developers about it. In this article, I'll introduce &lt;a href="http://sf.net/projects/octet"&gt;Octet&lt;/a&gt;, an object-oriented framework for molecular representation.&lt;/p&gt;

&lt;h4&gt;The Molecular Representation Problem&lt;/h4&gt;

&lt;p&gt;Isn't molecular representation a solved problem? After all, don't SMILES, Molfile, InChI, and CML adequately represent any molecule the average software developer is likely to see?&lt;/p&gt;

&lt;p&gt;As &lt;a href="http://depth-first.com/articles/2006/12/19/ferrocene-and-beyond-a-solution-to-the-molecular-representation-problem"&gt;previously discussed&lt;/a&gt;, molecular representation technologies have stagnated while the molecules chemists themselves now routinely make and use have continued to become more and more "exotic." Developers are now faced with the thorny problem that a variety of common structural motifs in chemistry can't be adequately represented with industry-standard cheminformatics tools.&lt;/p&gt;

&lt;p&gt;This point is so important, I'll repeat it: cheminformatics has fallen behind chemistry in the kinds of molecules it can work with. Quick fixes only allow the problem to fester; what's needed is a comprehensive solution. This is Octet's problem domain.&lt;/p&gt;

&lt;p&gt;Every framework is bounded by a specific problem domain. Although the size of the domain can vary, a framework provides a comprehensive solution within it. For complex and poorly standardized domains (such as molecular representation), a good framework can greatly accelerate application development.&lt;/p&gt;

&lt;p&gt;A good frameworks stays within its problem domain. One of the most important reasons is to prevent &lt;a href="http://headrush.typepad.com/creating_passionate_users/2005/06/featuritis_vs_t.html"&gt;featuritis&lt;/a&gt;, the root of much software evil. Keeping a framework focused on its core mission makes it much more likely that it can remain documented, tested, extensible, and efficient.&lt;/p&gt;

&lt;p&gt;By intention, a variety of features fall outside Octet's problem domain and so will never be directly supported. For example, rendering 2-D structure diagrams is a common problem in cheminformatics that has nothing to do with solving the molecular representation problem. Similarly, reading and writing SMILES strings and Molfiles are supported by many toolkits, but not by Octet directly. After all, it's the inherent limitations of these languages that Octet is trying to overcome.&lt;/p&gt;

&lt;p&gt;Higher-level functionality such as legacy language support and 2-D rendering, although not part of Octet itself, can be developed with Octet as a foundation. For example, two Octet add-on frameworks specifically address these problems. They are called &lt;a href="http://sf.net/projects/rxf/"&gt;Rosetta&lt;/a&gt; and &lt;a href="http://sf.net/proejects/structure/"&gt;Structure&lt;/a&gt;, respectively.&lt;/p&gt;

&lt;h4&gt;About This Series&lt;/h4&gt;

&lt;p&gt;This article is the first in a series discussing Octet. Future articles will describe in detail Octet's design, implementation, and use. Although Octet has come a long way, it's far from finished. My motivation for writing these articles is to hear what you have to say about Octet, so please feel free to &lt;a href="http://sourceforge.net/users/r_apodaca/"&gt;contact me&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Although Octet is written in Java, code examples discussed here will be written in Ruby. I've taken the same approach in discussing the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK) and &lt;a href="http://depth-first.com/articles/2006/08/28/drawing-2-d-structures-with-structure-cdk"&gt;Structure-CDK&lt;/a&gt;. Ruby's brevity and comfortable syntax make it ideal for both writing and discussing code.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubyforge.org/projects/rjb/"&gt;Ruby Java Bridge&lt;/a&gt; (RJB) is the magic technology that makes this possible. Previous articles have discussed the installation and use of RJB on &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;Windows&lt;/a&gt; and &lt;a href="http://depth-first.com/articles/2006/08/26/scripting-java-libraries-with-ruby-java-bridge"&gt;Linux&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;A Simple Test&lt;/h4&gt;

&lt;p&gt;Assuming you've installed Ruby, RubyGems and Ruby Java Bridge, you can perform a simple demonstration of Octet in Ruby:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;BasicMoleculeBuilder&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;net.sf.octet.builder.BasicMoleculeBuilder&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
&lt;span class="constant"&gt;RepresentationKit&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;net.sf.octet.util.RepresentationKit&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
&lt;span class="constant"&gt;MoleculeKit&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;net.sf.octet.util.MoleculeKit&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
&lt;span class="constant"&gt;System&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;java.lang.System&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;

&lt;span class="ident"&gt;builder&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;BasicMoleculeBuilder&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

&lt;span class="constant"&gt;RepresentationKit&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;buildHexane&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;builder&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

&lt;span class="ident"&gt;molecule&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;builder&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;releaseMolecule&lt;/span&gt;

&lt;span class="constant"&gt;MoleculeKit&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;printMolecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;System&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;out&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

The above code generates an Octet representation for n-hexane, and prints the representation to the console. To run this example, save the above code to a file called &lt;strong&gt;test.rb&lt;/strong&gt; in your working directory. Then add &lt;strong&gt;octet-0.8.2.jar&lt;/strong&gt;, which can be found in the &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=96108&amp;package_id=102647&amp;release_id=378955"&gt;Octet-0.8.2 source distribution&lt;/a&gt;, to the same directory. The test can then be run with the following sequence of commands:

&lt;div class="console"&gt;
&lt;pre&gt;
$ export CLASSPATH=./octet-0.8.2.jar
$ ruby test.rb
**Molecule Properties**

Atom Count: 6, Bonding System Count: 5

Atoms:
atom: C[0] (2nu 0e, 0or, 0.0fc, 1bs, 1n, 4.0val, 3ih )
atom: C[1] (2nu 0e, 0or, 0.0fc, 2bs, 2n, 4.0val, 2ih )
atom: C[2] (2nu 0e, 0or, 0.0fc, 2bs, 2n, 4.0val, 2ih )
atom: C[3] (2nu 0e, 0or, 0.0fc, 2bs, 2n, 4.0val, 2ih )
atom: C[4] (2nu 0e, 0or, 0.0fc, 2bs, 2n, 4.0val, 2ih )
atom: C[5] (2nu 0e, 0or, 0.0fc, 1bs, 1n, 4.0val, 3ih )

No non-natural isotopic distributions specified.

No Orbitals specified.

Bonding Systems:
bonding system:  ( 2be, 0abe, 2a, 1ap ) [ (0, 1) ]
bonding system:  ( 2be, 0abe, 2a, 1ap ) [ (1, 2) ]
bonding system:  ( 2be, 0abe, 2a, 1ap ) [ (2, 3) ]
bonding system:  ( 2be, 0abe, 2a, 1ap ) [ (3, 4) ]
bonding system:  ( 2be, 0abe, 2a, 1ap ) [ (4, 5) ]

Atom Pairs:
atom pair: (0, 1) (1.0 bo)
atom pair: (1, 2) (1.0 bo)
atom pair: (2, 3) (1.0 bo)
atom pair: (3, 4) (1.0 bo)
atom pair: (4, 5) (1.0 bo)

No Atomic Configurations specified.
No Conformation specified.
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As you can see, Octet shares the same concepts and vocabulary as &lt;a href="http://depth-first.com/articles/tag/flexmol"&gt;FlexMol&lt;/a&gt;. We'll drill down into the meaning of the output in later articles. The important thing to remember is that we can print out a report like the one above for any &lt;tt&gt;Molecule&lt;/tt&gt;, no matter how complex.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Octet is an object-oriented framework designed to solve the molecular representation problem and serve as a solid foundation for a variety of cheminformatics applications. Of course, there's much more to Octet than the simple example shown here. Future articles will describe in greater detail the design and use of Octet through illustrative examples.&lt;/p&gt;</description>
      <pubDate>Tue, 30 Jan 2007 14:45:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:a53f05f2-aafb-4d26-a345-fdbfc6e9d724</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/01/30/an-object-oriented-framework-for-molecular-representation-getting-started-with-octet</link>
      <category>Tools</category>
      <category>octet</category>
      <category>flexmol</category>
      <category>representation</category>
      <category>java</category>
      <category>ruby</category>
      <category>inchi</category>
      <category>cml</category>
      <category>molfile</category>
      <category>smiles</category>
      <category>framework</category>
    </item>
    <item>
      <title>Debabelization</title>
      <description>&lt;blockquote&gt;
    &lt;p&gt;Today, we find &lt;em&gt;Chemical Abstracts&lt;/em&gt; with over two million compounds coded in a connectivity table system and ISI with close to a million compounds coded in WLN. The U.S. Patent Office has large files coded in the Hayward notation; the IDC has large numbers of compounds in its CT and GREMAS Code. Derwent has a sizable patent file coded in one fragment code, and many journal literature compounds coded in the Ring Code fragment code. There are a number of individual companies and government agencies with over 100,000 compounds coded in "a" system. And almost all companies synthesizing new compounds have some internal system for their compounds. Finally, there are many universities with a wide variety of coded structure files.&lt;/p&gt;

    &lt;p&gt;-&lt;cite&gt;Charles E. Granito &lt;a href="http://dx.doi.org/10.1021/c160049a009"&gt;J. Chem. Doc. 1973, 13, 72-74&lt;/a&gt;&lt;/cite&gt; &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The situation described by Granito in 1973 seems eerily familiar today. The names of the players, the technologies, and encoding systems have changed, but the problem of multiple incompatible molecular languages has persisted for over 30 years.&lt;/p&gt;

&lt;p&gt;This problem will become even more pronounced in the near future as &lt;a href="http://depth-first.com/articles/2006/11/07/twelve-free-chemistry-databases"&gt;free chemistry databases on the Web&lt;/a&gt; continue their rapid proliferation. In Granito's world of closed, proprietary databases and unevenly distributed computer power, interoperability was an afterthought; in the coming world of free, open databases, and ubiquitous computer networks that connect to them, interoperability will be taken for granted.&lt;/p&gt;

&lt;p&gt;Granito goes on to observe that "there is no one 'best' system" for molecular representation. And he's right. Molecular languages evolve within a particular problem domain, just as human languages evolve within a specific cultural context. This isn't to say that a molecular language can't be creatively &lt;a href="http://dx.doi.org/10.1021/ci0496797"&gt;adapted to serve purposes for which it was never designed&lt;/a&gt;. Trying to do so is, after all, how new languages are conceived.&lt;/p&gt;

&lt;p&gt;Consider the case of InChI, which is both a molecular identification system and a &lt;a href="http://depth-first.com/articles/2006/08/18/107-years-of-line-formula-notations-1861-1968"&gt;line notation&lt;/a&gt;, or &lt;a href="http://cml.sourceforge.net/"&gt;Chemical Markup Language&lt;/a&gt; (CML), an XML language. There are vast areas of chemistry in which using either InChI or CML will be problematic - particularly polymers, organometallics, and inorganic chemistry. And let's not ignore new molecular representation problems brewing on the horizon like &lt;a href="http://dx.doi.org/10.1002/anie.200602173"&gt;small molecule tertiary structure&lt;/a&gt;. Yet for pure organic chemistry as most of us know it today, InChI and CML may well be optimal.&lt;/p&gt;

&lt;p&gt;The problem is that both InChI and CML compete with simpler, entrenched alternatives - SMILES and molfile, respectively. Even MDL, the author of the original molfile specification, is having difficulty gaining acceptance for its new molfile format, despite significant technical advantages.&lt;/p&gt;

&lt;p&gt;If history is any guide, we can look forward to at least as many molecular languages in the next thirty years as we've seen in the last thirty. It wasn't long ago that WLN was viewed as &lt;a href="http://dx.doi.org/10.1021/ci00034a005"&gt;the language of the future&lt;/a&gt;. Now it just looks cryptic. For this we can thank a combination of technology advances and the emergence of a far simpler alternative, SMILES. A similar fate, more likely than not, awaits all molecular languages currently in use.&lt;/p&gt;

&lt;p&gt;Will there ever be a universal molecular language and is there any point in trying to invent one? Every area of chemistry introduces its own peculiarities not shared by any of the others. Yet all users want the simplest language possible. These two contradictory forces ensure that a universal language is unlikely to ever appear. In other words, the most successful new molecular languages are likely to be &lt;em&gt;agile&lt;/em&gt; - simple, easy to learn, cheap to implement, and quickly adaptable in the face of new chemical concepts and advances in computer technology.&lt;/p&gt;</description>
      <pubDate>Wed, 08 Nov 2006 14:32:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:f484e38a-6ec5-41d1-81fd-5eeb35465ead</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/11/08/debabelization</link>
      <category>Meta</category>
      <category>molecularlanguage</category>
      <category>inchi</category>
      <category>cml</category>
      <category>integration</category>
      <category>databases</category>
    </item>
    <item>
      <title>Decoding IUPAC Names With OPSIN</title>
      <description>&lt;p&gt;IUPAC chemical nomenclature is everywhere. It can be found in journal articles, both new and old, on the Web, in databases, on Material Safety Data Sheets (MSDS), in chemical catalogs, and just about anywhere chemical information is found. The rules of this nomenclature are one of the first things taught in Organic Chemistry classes, and entire books are devoted to the subject. Although software for IUPAC nomenclature translation has been researched &lt;a href="http://depth-first.com/articles/2006/09/10/chemical-nomenclature-translation"&gt;since the 1970s&lt;/a&gt;, it has only become widespread within the last ten years. As is typical, IUPAC nomenclature developer toolkits are closed, proprietary, very expensive, and not customizable - &lt;a href="http://depth-first.com/articles/2006/09/11/visualizing-iupac-names-with-chemnomparse"&gt;with one notable exception&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A little software package called OPSIN may be set to change this. Read on to see how you can use OPSIN to begin programatically decoding IUPAC chemical nomenclature today.&lt;/p&gt;

&lt;h4&gt;Meet OPSIN&lt;/h4&gt;

&lt;p&gt;OPSIN is an Open Source Java library for parsing IUPAC nomenclature. Despite its early development status, OPSIN can decode a variety of difficult features in basic IUPAC nomenclature, including bicyclo systems, nested substitution, saturated heterocycles, and a variety of arenes and heteroarenes. OPSIN currently doesn't handle stereochemistry, organometallics, or a variety of other advanced IUPAC nomenclature features.&lt;/p&gt;

&lt;h4&gt;Brief Background&lt;/h4&gt;

&lt;p&gt;OPSIN was written by &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/corbett/"&gt;Peter Corbett&lt;/a&gt; at the University of Cambridge. Until recently, OPSIN was an integral part of of the innovative chemical data checker &lt;a href="http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/ExperimentalDataChecker/index.asp"&gt;OSCAR&lt;/a&gt;. One of the exciting uses of OSCAR is in the &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=59"&gt;automated validation&lt;/a&gt; of experimental data.&lt;/p&gt;

&lt;h4&gt;Getting OPSIN&lt;/h4&gt;

&lt;p&gt;Recently, OPSIN was factored out of OSCAR. It can now be downloaded as two standalone packages from SourceForge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin_0.1.0.zip?download"&gt;Source Distribution&lt;/a&gt;: Contains the complete OPSIN source code, all library dependencies, all datasets, and an Ant build script.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://prdownloads.sourceforge.net/oscar3-chem/opsin-big-0.1.0.jar?download"&gt;Jarfile&lt;/a&gt;: A standalone jarfile containing all library dependencies and data files.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;What OPSIN Does&lt;/h4&gt;

&lt;p&gt;OPSIN accepts an IUPAC name, encoded as a &lt;tt&gt;String&lt;/tt&gt; object, as input and provides a &lt;a href="http://www.xml-cml.org/"&gt;Chemical Markup Language&lt;/a&gt; (CML) document object model as output. The main point of entry into the library is the &lt;tt&gt;NameToStructure&lt;/tt&gt; class and its two overloaded &lt;tt&gt;parseToCML&lt;/tt&gt; methods.&lt;/p&gt;

&lt;p&gt;OPSIN's output is the root node in a &lt;a href="http://www.xom.nu/"&gt;XOM&lt;/a&gt; XML &lt;tt&gt;Element&lt;/tt&gt; hierarchy. XOM's &lt;tt&gt;Element&lt;/tt&gt; class provides a convenience method, &lt;tt&gt;toXML&lt;/tt&gt; that conveniently prints the text-based XML representation for itself and all &lt;tt&gt;Elements&lt;/tt&gt; below it.&lt;/p&gt;

&lt;p&gt;Because its output is pure XML, OPSIN does not depend on any chemical informatics toolkit to do its job. This makes OPSIN ideal for use within larger chemical informatics systems. Provided your software can interpret CML, you should be able to manipulate OPSIN's output in a variety of useful ways.&lt;/p&gt;

&lt;h4&gt;What's Next?&lt;/h4&gt;

&lt;p&gt;Future articles will discuss OPSIN's capabilities and limitations in more detail. As has become customary for Depth-First's tutorials, &lt;a href="http://ruby-lang.org"&gt;Ruby&lt;/a&gt; and the excellent &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;Ruby Java Bridge&lt;/a&gt; will be used to illustrate the important points.&lt;/p&gt;</description>
      <pubDate>Sat, 14 Oct 2006 14:39:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:2550a5cb-baf7-419b-af18-338272f3bb59</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/10/14/decoding-iupac-names-with-opsin</link>
      <category>Tools</category>
      <category>opsin</category>
      <category>nametostruct</category>
      <category>iupac</category>
      <category>oscar</category>
      <category>xom</category>
      <category>cml</category>
    </item>
    <item>
      <title>Hacking NMRShiftDB</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/nmrshift-logo.gif" align="right"&gt;&lt;/img&gt;&lt;a href="http://nmrshiftdb.org"&gt;NMRShiftDB&lt;/a&gt; is an open web database of peer-reviewed NMR chemical shifts compiled by volunteers. As of this writing, it contains 22,429 measured spectra from 18,986 structures, and reports 927 registered users. The &lt;a href="http://sourceforge.net/projects/nmrshiftdb/"&gt;database code&lt;/a&gt; itself is open source.&lt;/p&gt;

&lt;p&gt;Although NMRShiftDB has a web interface, its architecture is designed to simplify writing programs that use it. A &lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;previous article&lt;/a&gt; showed how a working &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; API could be written with just a few lines of Ruby. This time, I'll show how the same thing can be done for NMRShiftDB.&lt;/p&gt;

&lt;h4&gt;Ingredients&lt;/h4&gt;

&lt;p&gt;This tutorial uses Arton's excellent &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt;, the installation and use of which has been &lt;a href="http://depth-first.com/articles/2006/08/26/scripting-java-libraries-with-ruby-java-bridge"&gt;previously discussed&lt;/a&gt;. Also used is Ruby's InChI interface, &lt;a href="http://rubyforge.org/projects/rino"&gt;Rino&lt;/a&gt;, for which installation instructions are &lt;a href="http://depth-first.com/articles/2006/08/17/ruby-and-inchi-the-rino-library"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Create a working directory called &lt;strong&gt;nmr&lt;/strong&gt;. Into this directory, copy &lt;strong&gt;cdk-20060714.jar&lt;/strong&gt;, which can be &lt;a href="http://prdownloads.sourceforge.net/cdk/cdk-20060714.jar?download"&gt;downloaded here&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Code&lt;/h4&gt;

&lt;p&gt;Create a file called &lt;strong&gt;nmr.rb&lt;/strong&gt; containing the following Ruby code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net/http&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;smi2inchi&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="comment"&gt;# A very simple NMRShiftDB Web API.&lt;/span&gt;
&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;NMRFetcher&lt;/span&gt;

  &lt;span class="comment"&gt;# Creates a &amp;lt;tt&amp;gt;Translator&amp;lt;/tt&amp;gt; instance.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@translator&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Translator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns an XML record, as a string, for the molecule&lt;/span&gt;
  &lt;span class="comment"&gt;# with SMILES matching &amp;lt;tt&amp;gt;smiles&amp;lt;/tt&amp;gt; and spectrum type&lt;/span&gt;
  &lt;span class="comment"&gt;# matching &amp;lt;tt&amp;gt;spectrumtype&amp;lt;/tt&amp;gt; (13C, 1H, 15N and 31P).&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;get_record&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;spectrumtype&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;body&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt;
    &lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smi2inchi&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)).&lt;/span&gt;&lt;span class="ident"&gt;gsub&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;InChI=&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;inchi=&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
    &lt;span class="ident"&gt;path&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;/NmrshiftdbServlet?nmrshiftdbaction=exportcmlbyinchi&amp;amp;&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&amp;amp;spectrumtype=&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;spectrumtype&lt;/span&gt;

    &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;HTTP&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;start&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;nmrshiftdb.ice.mpg.de&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;response&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;http&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;path&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="ident"&gt;body&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;response&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="punct"&gt;!&lt;/span&gt;&lt;span class="ident"&gt;valid_record?&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;nil&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;body&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

&lt;span class="ident"&gt;private&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;valid_record?&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="punct"&gt;!&lt;/span&gt;&lt;span class="ident"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;eql?&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;No such molecule or spectrum&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;smi2inchi&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="attribute"&gt;@translator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;translate&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The magic in the above code is nothing more than a simple HTTP request sent to &lt;tt&gt;nmrshiftdb.ice.mpg.de&lt;/tt&gt;, contained in the &lt;tt&gt;get_record&lt;/tt&gt; method. This request encodes an InChI identifier, which is generated from the SMILES string passed as an argument. We also specify a spectrum type.&lt;/p&gt;

&lt;p&gt;Now create a file called &lt;strong&gt;smi2inchi.rb&lt;/strong&gt;, containing the following Ruby code:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="constant"&gt;ENV&lt;/span&gt;&lt;span class="punct"&gt;['&lt;/span&gt;&lt;span class="string"&gt;CLASSPATH&lt;/span&gt;&lt;span class="punct"&gt;']&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;./cdk-20060714.jar&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rino&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;StringWriter&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java.io.StringWriter&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;SmilesParser&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.smiles.SmilesParser&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="constant"&gt;MDLWriter&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.io.MDLWriter&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="comment"&gt;# Converts a SMILES string into an InChI identifier using&lt;/span&gt;
&lt;span class="comment"&gt;# the CDK Library (Java) and the Rino Library (Ruby/C).&lt;/span&gt;
&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Translator&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@smiles_parser&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;SmilesParser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@mdl_writer&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;MDLWriter&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@mol2inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rino&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;MolfileReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Returns an InChI identifier from the specified SMILES string.&lt;/span&gt;
  &lt;span class="comment"&gt;# Uses the CDK classes SmilesParser and MDLWriter to generate&lt;/span&gt;
  &lt;span class="comment"&gt;# a molfile from a SMILES string. Then this molfile is&lt;/span&gt;
  &lt;span class="comment"&gt;# parsed by Rino::MolfileReader.&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;translate&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@smiles_parser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parseSmiles&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="ident"&gt;sw&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;StringWriter&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

    &lt;span class="attribute"&gt;@mdl_writer&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;setWriter&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;sw&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="attribute"&gt;@mdl_writer&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

    &lt;span class="attribute"&gt;@mol2inchi&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;sw&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;toString&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The description and use of this code was discussed in &lt;a href="http://depth-first.com/articles/2006/08/26/from-smiles-to-inchi-rino-cdk-and-java-ruby-bridge"&gt;a recent article&lt;/a&gt; on generating InChI identifiers from SMILES strings.&lt;/p&gt;

&lt;p&gt;Before using the code we've just created you'll need to set the &lt;tt&gt;LD_LIBRARY_PATH&lt;/tt&gt; (or equivalent) to point to the native Java libraries. On Linux with Sun's JDK, this is done from the command line with:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/i386:$LD_LIBRARY_PATH
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Using the &lt;tt&gt;NMRFetcher&lt;/tt&gt; class is just a matter of creating an instance, and invoking &lt;tt&gt;get_record&lt;/tt&gt; with the desired SMILES string and spectrum type (1H, 13C). Doing so returns a CML document containing the structure of the compound and its spectrum. If no record matches, the method returns &lt;tt&gt;nil&lt;/tt&gt;. The code below give an example in which the CML output is pretty-printed using the wonderful Ruby API for XML, &lt;a href="http://www.germane-software.com/software/rexml/"&gt;REXML&lt;/a&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;rexml/document&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;nmr&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;nmr&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;NMRFetcher&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
&lt;span class="ident"&gt;smiles&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;c1ccccc1&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="comment"&gt;#benzene, to keep things simple&lt;/span&gt;
&lt;span class="ident"&gt;type&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;13C&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;record&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;nmr&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get_record&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;type&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

&lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="ident"&gt;record&lt;/span&gt; &lt;span class="comment"&gt;#pretty-print the CML record using REXML&lt;/span&gt;
  &lt;span class="ident"&gt;file&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;result.xml&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;w&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;

  &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;REXML&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Document&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;record&lt;/span&gt;&lt;span class="punct"&gt;)).&lt;/span&gt;&lt;span class="ident"&gt;write&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;file&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

  &lt;span class="ident"&gt;file&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;close&lt;/span&gt;
&lt;span class="keyword"&gt;else&lt;/span&gt; &lt;span class="comment"&gt;#write an error&lt;/span&gt;
  &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;open&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;result.error&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;w&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;file&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
    &lt;span class="ident"&gt;file&lt;/span&gt; &lt;span class="punct"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;No record of SMILES: &lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

The above code can be put into a file (&lt;strong&gt;test.rb&lt;/strong&gt;) and run:

&lt;div class="console"&gt;
&lt;pre&gt;
$ ruby test.rb
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Alternatively, it can be entered interactively and played with using irb:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Output&lt;/h4&gt;

&lt;p&gt;The program produces the following &lt;a href="http://www.xml-cml.org/"&gt;Chemical Markup Language&lt;/a&gt; output in a file called &lt;strong&gt;result.xml&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;molecule&lt;/span&gt; &lt;span class="attribute"&gt;title&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;Benzene&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;nmrshiftdb7901&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/schema/cml2/core&lt;/span&gt;&lt;span class="punct"&gt;'&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atomArray&lt;/span&gt; &lt;span class="attribute"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/schema&lt;/span&gt;&lt;span class="punct"&gt;'&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;y2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0.7625&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;x2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;-1.4063&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a1&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;formalCharge&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;hydrogenCount&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;y2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0.35&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;x2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;-2.1207&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a2&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;formalCharge&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;hydrogenCount&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;y2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;-0.475&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;x2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;-2.1207&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a3&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;formalCharge&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;hydrogenCount&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;y2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;-0.8875&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;x2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;-1.4063&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a4&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;formalCharge&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;hydrogenCount&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;y2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;-0.475&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;x2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;-0.6918&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a5&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;formalCharge&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;hydrogenCount&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;atom&lt;/span&gt; &lt;span class="attribute"&gt;elementType&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;C&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;y2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0.35&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;x2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;-0.6918&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a6&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;formalCharge&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;hydrogenCount&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;atomArray&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bondArray&lt;/span&gt; &lt;span class="attribute"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/schema&lt;/span&gt;&lt;span class="punct"&gt;'&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a1 a2&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;S&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;b1&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a2 a3&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;D&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;b2&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a3 a4&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;S&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;b3&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a4 a5&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;D&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;b4&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a5 a6&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;S&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;b5&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;bond&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs2&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a1 a6&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;order&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;D&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;b6&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;bondArray&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;spectrum&lt;/span&gt; &lt;span class="attribute"&gt;moleculeRef&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;nmrshiftdb7901&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="namespace"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;:&lt;/span&gt;&lt;span class="attribute"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/dict/cml&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="namespace"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;:&lt;/span&gt;&lt;span class="attribute"&gt;cmlDict&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/dict/cmlDict&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="namespace"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;:&lt;/span&gt;&lt;span class="attribute"&gt;siUnits&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/units/siUnits&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;type&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;NMR&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="namespace"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;:&lt;/span&gt;&lt;span class="attribute"&gt;macie&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/dict/macie&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="namespace"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;:&lt;/span&gt;&lt;span class="attribute"&gt;units&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/units/units&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;nmrshiftdb15502&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="namespace"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;:&lt;/span&gt;&lt;span class="attribute"&gt;subst&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/dict/substDict&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="namespace"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;:&lt;/span&gt;&lt;span class="attribute"&gt;nmr&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.nmrshiftdb.org/dict&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/schema/cml2/spect&lt;/span&gt;&lt;span class="punct"&gt;'&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;conditionList&lt;/span&gt; &lt;span class="attribute"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/schema&lt;/span&gt;&lt;span class="punct"&gt;'&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;scalar&lt;/span&gt; &lt;span class="attribute"&gt;dataType&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;xsd:string&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;units&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;siUnits:k&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;dictRef&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;cml:temp&lt;/span&gt;&lt;span class="punct"&gt;'&amp;gt;&lt;/span&gt;298&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;scalar&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;scalar&lt;/span&gt; &lt;span class="attribute"&gt;dataType&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;xsd:string&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;units&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;siUnits:hertz&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;dictRef&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;cml:field&lt;/span&gt;&lt;span class="punct"&gt;'&amp;gt;&lt;/span&gt;Unreported&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;scalar&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;conditionList&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;metadataList&lt;/span&gt; &lt;span class="attribute"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/schema&lt;/span&gt;&lt;span class="punct"&gt;'&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;metadata&lt;/span&gt; &lt;span class="attribute"&gt;name&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;nmr:OBSERVENUCLEUS&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;content&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;13C&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;metadataList&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;peakList&lt;/span&gt; &lt;span class="attribute"&gt;xmlns&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;http://www.xml-cml.org/schema&lt;/span&gt;&lt;span class="punct"&gt;'&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;peak&lt;/span&gt; &lt;span class="attribute"&gt;xUnits&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;units:ppm&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;peakShape&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;sharp&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;xValue&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;128.5&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;p0&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="attribute"&gt;atomRefs&lt;/span&gt;&lt;span class="punct"&gt;='&lt;/span&gt;&lt;span class="string"&gt;a1 a2 a3 a4 a5 a6&lt;/span&gt;&lt;span class="punct"&gt;'/&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;peakList&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;spectrum&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;cml&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The kind of output produced by NMRFetcher and NMRShiftDB could be used in a variety of ways. Notice, near the bottom of the document, how peak assignments are made relative the the atom labels in the &lt;tt&gt;molecule&lt;/tt&gt; declaration. It should be possible, for example, to create interactive 2-D structure diagrams from this document in which a user mouses over an atom and gets a C-13 chemical shift.&lt;/p&gt;

&lt;p&gt;NMRShiftDB is a valuable and free online resource for NMR spectroscopy. Programatically mixing its capabilities with free software and other online services offers numerous opportunities to build innovative chemical informatics systems.&lt;/p&gt;</description>
      <pubDate>Mon, 04 Sep 2006 13:28:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:4db46312-f5b9-4369-b9bc-d949f28b61c5</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/04/hacking-nmrshiftdb</link>
      <category>Databases</category>
      <category>nmrshiftdb</category>
      <category>cml</category>
      <category>rjb</category>
      <category>inchi</category>
      <category>smiles</category>
    </item>
  </channel>
</rss>
