<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Customize InChI Output with Rino</title>
    <link>http://depth-first.com/articles/2007/03/19/customize-inchi-output-with-rino</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Customize InChI Output with Rino</title>
      <description>&lt;p&gt;&lt;a href="http://rubyforge.org/projects/rino"&gt;Rino&lt;/a&gt; is a toolkit for working with the &lt;a href="http://en.wikipedia.org/wiki/International_Chemical_Identifier"&gt;IUPAC International Chemical Identifier&lt;/a&gt; (InChI) in Ruby. Because it's based on the IUPAC/NIST InChI toolkit, Rino can be configured using a variety of useful options. This article summarizes those options and provides an illustrative example.&lt;/p&gt;

&lt;h4&gt;Complete List of InChI Command Line Options&lt;/h4&gt;

&lt;p&gt;The following is a complete summary of the IUPAC/NIST InChI toolkit command line options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SNon&lt;/strong&gt; Exclude stereo (Default: Include Absolute stereo)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SRel&lt;/strong&gt; Relative stereo&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SRac&lt;/strong&gt; Racemic stereo&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SUCF&lt;/strong&gt; Use Chiral Flag: On means Absolute stereo, Off - Relative&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SUU&lt;/strong&gt; Include omitted unknown/undefined stereo&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;NEWPS&lt;/strong&gt; Narrow end of wedge points to stereocenter (default: both)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SPXYZ&lt;/strong&gt; Include Phosphines Stereochemistry&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SAsXYZ&lt;/strong&gt; Include Arsines Stereochemistry&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RecMet&lt;/strong&gt; Include reconnected metals results&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;FixedH&lt;/strong&gt; Mobile H Perception Off (Default: On)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AuxNone&lt;/strong&gt; Omit auxiliary information (default: Include)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;NoADP&lt;/strong&gt; Disable Aggressive Deprotonation (for testing only)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compress&lt;/strong&gt; Compressed output&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DoNotAddH&lt;/strong&gt; Don't add H according to usual valences: all H are explicit&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Wnumber&lt;/strong&gt; Set time-out per structure in seconds; W0 means unlimited&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SDF:DataHeader&lt;/strong&gt; Read from the input SDfile the ID under this DataHeader&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;NoLabels&lt;/strong&gt; Omit structure number, DataHeader and ID from InChI output&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tabbed&lt;/strong&gt; Separate structure number, InChI, and AuxIndo with tabs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OutputSDF&lt;/strong&gt; Convert InChI created with default aux. info to SDfile&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;InChI2InChI&lt;/strong&gt; Convert InChI string into InChI string for validation purposes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SdfAtomsDT&lt;/strong&gt; Output Hydrogen Isotopes to SDfile as Atoms D and T&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;STDIO&lt;/strong&gt; Use standard input/output streams&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;FB&lt;/strong&gt; (or FixSp3Bug) Fix bug leading to missing or undefined sp3 parity&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;WarnOnEmptyStructure&lt;/strong&gt; Warn and produce empty InChI for empty structure&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;A Test&lt;/h4&gt;

&lt;p&gt;The following code displays the InChI for benzoic acid with and without mobile hydrogen atom perception. It requires both &lt;a href="http://depth-first.com/articles/tag/rino"&gt;Rino&lt;/a&gt; and &lt;a href="http://depth-first.com/articles/tag/rcdk"&gt;Ruby CDK&lt;/a&gt;. The latter library is used to convert a SMILES string into a molfile for use by Rino.&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rino&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk/util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;molfile&lt;/span&gt;&lt;span class="punct"&gt;=&lt;/span&gt;&lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Lang&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;smiles_to_molfile&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;c1ccccc1C(=O)O&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="comment"&gt;# benzoic acid&lt;/span&gt;
&lt;span class="ident"&gt;reader&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rino&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;MolfileReader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
&lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;molfile&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Without mobile hydrogen perception:&lt;span class="escape"&gt;\n&lt;/span&gt;&lt;span class="expr"&gt;#{inchi}&lt;/span&gt;&lt;span class="escape"&gt;\n\n&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

&lt;span class="ident"&gt;reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;options&lt;/span&gt; &lt;span class="punct"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;-FixedH&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;inchi&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;reader&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;molfile&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;With mobile hydrogen perception:&lt;span class="escape"&gt;\n&lt;/span&gt;&lt;span class="expr"&gt;#{inchi}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;-FixedH&lt;/tt&gt; flag used by the reader the second time tells Rino to identify mobile hydrogens in the InChI output. Some InChI authors use this form of InChI and others don't. PubChem is an example of a large InChI author that does use mobile hydrogen perception, as their entry for &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=243"&gt;benzoic acid&lt;/a&gt; demonstrates. To perform an exact match of your InChIs with theirs, the &lt;tt&gt;-FixedH&lt;/tt&gt; flag must be set.&lt;/p&gt;

&lt;h4&gt;Running the Test&lt;/h4&gt;

&lt;p&gt;Running the test code produces the following output:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
Without mobile hydrogen perception:
InChI=1/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)

With mobile hydrogen perception:
InChI=1/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)/f/h8H
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;When matching InChIs generated by other authors, it's best to adopt their processing conventions. Rino makes it conventient to do so through its full support for the standard IUPAC/NIST command line options.&lt;/p&gt;</description>
      <pubDate>Mon, 19 Mar 2007 10:30:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:1975ab63-5e0e-4ef7-9227-46b1fb0f0939</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/19/customize-inchi-output-with-rino</link>
      <category>Tools</category>
      <category>rino</category>
      <category>inchi</category>
      <category>pubchem</category>
      <category>commandline</category>
    </item>
  </channel>
</rss>
