<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: A Simple and Portable Ruby Interface to InChI</title>
    <link>http://depth-first.com/articles/2008/05/29/a-simple-and-portable-ruby-interface-to-inchi</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>A Simple and Portable Ruby Interface to InChI</title>
      <description>&lt;p&gt;&lt;a href="http://ruby-lang.org"&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Although the &lt;a href="http://depth-first.com/articles/2007/09/27/inchi-for-newbies"&gt;InChI&lt;/a&gt; software itself is written in C, it can still be used via Ruby. &lt;a href="http://depth-first.com/articles/2007/03/19/customize-inchi-output-with-rino"&gt;Rino&lt;/a&gt; offers one implementation of a Ruby InChI interface that makes use of a C extension. This article describes a more concise and portable solution.&lt;/p&gt;

&lt;h4&gt;The Code&lt;/h4&gt;

&lt;p&gt;The following code will accept a String encoding a molfile and return either its InChI, or an empty String if no InChI could be found:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;InChI&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;inchi_for&lt;/span&gt; &lt;span class="ident"&gt;molfile&lt;/span&gt;
    &lt;span class="ident"&gt;output&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;%x[&lt;/span&gt;&lt;span class="string"&gt;echo &amp;quot;&lt;span class="expr"&gt;#{molfile}&lt;/span&gt;&amp;quot; | cInChI-1 -STDIO&lt;/span&gt;&lt;span class="punct"&gt;]&lt;/span&gt;

    &lt;span class="ident"&gt;output&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;eql?&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt; &lt;span class="punct"&gt;?&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;:&lt;/span&gt; &lt;span class="ident"&gt;output&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;split&lt;/span&gt;&lt;span class="punct"&gt;(/&lt;/span&gt;&lt;span class="regex"&gt;&lt;span class="escape"&gt;\n&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;/)[&lt;/span&gt;&lt;span class="number"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;]&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This code takes advantage of Ruby's built-in support for &lt;a href="http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_expressions.html#UA"&gt;Command Expansion&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Testing the Code&lt;/h4&gt;

&lt;p&gt;The code below tests the library:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;inchi&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;include&lt;/span&gt; &lt;span class="constant"&gt;InChI&lt;/span&gt;

&lt;span class="ident"&gt;molfile&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt;
&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://chempedia.com/compounds/106.mol
  -OEChem-03010811072D

 12 12  0     0  0  0  0  0  0999 V2000
    2.8660    1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.0000    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.0000   -0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321   -0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660   -1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660    1.6200    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.4631    0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.2690    0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.4631   -0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.2690   -0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660   -1.6200    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  1  3  1  0  0  0  0
  1  7  1  0  0  0  0
  2  4  1  0  0  0  0
  2  8  1  0  0  0  0
  3  5  2  0  0  0  0
  3  9  1  0  0  0  0
  4  6  2  0  0  0  0
  4 10  1  0  0  0  0
  5  6  1  0  0  0  0
  5 11  1  0  0  0  0
  6 12  1  0  0  0  0
M  END&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Found InChI: &lt;span class="expr"&gt;#{inchi_for(molfile)}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We can run the test by saving it in a file called &lt;strong&gt;test.rb&lt;/strong&gt; and executing it:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ ruby test.rb
InChI version 1, Software version 1.02-beta August 2007
Log file not specified. Using standard error output.
Input file not specified. Using standard input.
Output file not specified. Using standard output.
Options: Mobile H Perception ON
Isotopic ON, Absolute Stereo ON
Omit undefined/unknown stereogenic centers and bonds
Full Aux. info
Input format: MOLfile
Output format: Plain text
Timeout per structure: 60.000 sec; Up to 1024 atoms per structure
End of file detected after structure #1.
Finished processing 1 structure: 0 errors, processing time 0:00:00.00
Found InChI: InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;The above approach only requires that it be run on a UNIX-like system, and that a copy of the InChI library be present on your path.&lt;/p&gt;

&lt;h4&gt;Advantages&lt;/h4&gt;

&lt;p&gt;The approach described here offers some important advantages over Rino:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;It works without modification on both the &lt;a href="http://en.wikipedia.org/wiki/Ruby_MRI"&gt;Matz Ruby Interpreter&lt;/a&gt; (C-Ruby) and &lt;a href="http://jruby.codehaus.org/"&gt;JRuby&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It neither creates nor uses files.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Disadvantages&lt;/h4&gt;

&lt;p&gt;This approach creates a lot of noisy log output to the console. There must be a way to suppress it, but so far I haven't found out how.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Using Ruby's support for Command Expansions has enabled the creation of a concise and portable Ruby interface to the InChI toolkit. Similar principles would apply to any Unix command-line binary, including for example, &lt;a href="http://openbabel.org/wiki/Babel"&gt;Open Babel&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Thu, 29 May 2008 12:12:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:a79f3191-e044-4db5-8676-38f97fcaeedf</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/05/29/a-simple-and-portable-ruby-interface-to-inchi</link>
      <category>Tools</category>
      <category>ruby</category>
      <category>inchi</category>
      <category>rino</category>
      <category>commandexpansion</category>
    </item>
    <item>
      <title>"A Simple and Portable Ruby Interface to InChI" by Andrew Dalke</title>
      <description>&lt;p&gt;Yes, whitelisting like that should work for this case, in the manner you have there.  There's still a worry in my head if someone submits a multi-structure SD file, since cInChI-1 will convert all structures you give it.  Your filter removes the "&gt;  &amp;lt;" key/value data fields from the SD file, and I don't know if that will confuse the inchi reader.&lt;/p&gt;

&lt;p&gt;I don't like being worried like this, so I do my best to avoid passing user input in on the command line.&lt;/p&gt;</description>
      <pubDate>Sat, 31 May 2008 16:03:40 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:b1e39b02-a457-4bb1-8e2d-92029eaf3a81</guid>
      <link>http://depth-first.com/articles/2008/05/29/a-simple-and-portable-ruby-interface-to-inchi#comment-589</link>
    </item>
    <item>
      <title>"A Simple and Portable Ruby Interface to InChI" by Rich Apodaca</title>
      <description>&lt;p&gt;Andrew, you raise some very important security issues. The risk is that Ruby code using Command Expansion could be coaxed into executing arbitrary system commands, rather than just generating an error.&lt;/p&gt;

&lt;p&gt;A regex cleanup like this might work:&lt;/p&gt;

&lt;pre&gt;
inchi_for m.gsub(/[^0-9a-z\.\-\n ]/i, '')
&lt;/pre&gt;

&lt;p&gt;Single-quoting alone would ensure that the only substitutions that could occur are \\ -&gt; \ and \' -&gt; ', the latter of which causes an error.&lt;/p&gt;

&lt;p&gt;Your point is well taken - any use of this approach on foreign data should carefully consider all of the security implications.&lt;/p&gt;</description>
      <pubDate>Sat, 31 May 2008 15:04:56 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:38840480-84c9-4f21-93ce-3edd0392fea6</guid>
      <link>http://depth-first.com/articles/2008/05/29/a-simple-and-portable-ruby-interface-to-inchi#comment-588</link>
    </item>
    <item>
      <title>"A Simple and Portable Ruby Interface to InChI" by Andrew Dalke</title>
      <description>&lt;p&gt;
I've not been silent with my own complaints about InChI. :)
&lt;/p&gt;&lt;p&gt;
"fork" is deemed unsafe?  Strange.  Are they worried about fork-bombs or something else?  I'm more surprised that %x[] works because it has a lot of security problems.  I pointed out the one with the quotes.  You fixed it via removal of any double quotes, which should be fine for InChI.  But you also need to remove "\" characters, which are interpreted as escape character in the context you're using.  And because you are using double quotes, other characters, like $, and ` have meaning.

&lt;/p&gt;&lt;p&gt;
Consider:
&lt;pre&gt;
irb(main):011:0&gt; text = "Bad\\"
=&gt; "Bad\\"
irb(main):013:0&gt; %x[echo "#{text}" | wc -c]
sh: -c: line 1: unexpected EOF while looking for matching `"'
sh: -c: line 2: syntax error: unexpected end of file
=&gt; ""
irb(main):014:0&gt; text = "$$"
=&gt; "$$"
irb(main):015:0&gt; %x[echo "#{text}"]        
=&gt; "3749\n"
irb(main):016:0&gt; text = "`ls`"
=&gt; "`ls`"
irb(main):017:0&gt; %x[echo #{text} | wc -c]
=&gt; "     132\n"

&lt;/pre&gt;
At the very least, use single quotes.  And hope this code isn't run under Windows because the failure mode is going to be quite unexpected.

&lt;/p&gt;&lt;p&gt;

It's really hard to be safe when using system(3C), which appears to be what Ruby is using here.  I don't know Ruby well so don't know if a correct (and platform independent) function exists there.  I did find the &lt;a href="http://www.a-k-r.org/escape/" rel="nofollow"&gt;Escape&lt;/a&gt; module which has the right function.

&lt;/p&gt;&lt;p&gt;

Doing research now, under Java/JRuby your best choice is to use Runtime.exec() (pre 1.5) or java.lang.ProcessBuilder .

&lt;/p&gt;&lt;p&gt;

I usually use the "redirect stderr to /dev/null" trick when I do this sort of work, but sometimes I need the stderr output so I can get the program version number, or error message about why something failed.

&lt;/p&gt;&lt;p&gt;
The moral of my story is, don't trust user input, especially when passed to the command line.
&lt;/p&gt;</description>
      <pubDate>Sat, 31 May 2008 12:16:26 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:0cb05592-d6cc-4ab5-816c-5cd5b6414bcf</guid>
      <link>http://depth-first.com/articles/2008/05/29/a-simple-and-portable-ruby-interface-to-inchi#comment-587</link>
    </item>
    <item>
      <title>"A Simple and Portable Ruby Interface to InChI" by Rich Apodaca</title>
      <description>&lt;p&gt;Andrew, thanks for the feedback. The &lt;a href="http://sourceforge.net/mailarchive/forum.php?thread_name=051101c853c6%24c18afac0%240100a8c0%40chemref838.nist.gov&amp;amp;forum_name=inchi-discuss" rel="nofollow"&gt;email thread&lt;/a&gt; you refer to has some useful information for those who want to share their InChIs with others. I've never thought that exchanging InChI keys between organizations would work well precisely for this reason.&lt;/p&gt;

&lt;p&gt;Great suggestion about Open3. Unfortunately, it has issues on JRuby (1.1.1):&lt;/p&gt;


&lt;pre&gt;
$ jirb
irb(main):001:0&amp;gt; require 'open3'
=&amp;gt; true
irb(main):002:0&amp;gt; stdin, stdout, stderr = Open3.popen3("wc -c")
NotImplementedError: fork is unsafe and disabled by default on JRuby
        from (irb):3:in `popen3'
        from (irb):3:in `load_history'
irb(main):003:0&amp;gt;
&lt;/pre&gt;


&lt;p&gt;I did a test to check if an unclosed quote placed on the comments line would cause problems, and it does. An unclosed single quote was fine, though. A workaround would be:&lt;/p&gt;

&lt;pre&gt;
inchi_for m.gsub(/["]/, "")
&lt;/pre&gt;

&lt;p&gt;AFAIK, doing this doesn't change the resulting InChI in any way.&lt;/p&gt;

&lt;p&gt;A solution to the noisy output problem was also &lt;a href="http://depth-first.com/articles/2008/05/30/a-simple-and-portable-ruby-interface-to-inchi-part-2-silencing-console-output" rel="nofollow"&gt;described here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So it looks like for now the method described above using Command Expansion is the most broadly-usable. For extremely large molfiles, this could be a problem, but for everything else it seems to work.&lt;/p&gt;</description>
      <pubDate>Sat, 31 May 2008 10:31:29 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:190d6a08-824e-4756-842d-2a21a4cb38a7</guid>
      <link>http://depth-first.com/articles/2008/05/29/a-simple-and-portable-ruby-interface-to-inchi#comment-585</link>
    </item>
    <item>
      <title>"A Simple and Portable Ruby Interface to InChI" by Andrew Dalke</title>
      <description>&lt;p&gt;You are missing some required command-line parameters  (notice that I don't say "options", because they aren't optional).&lt;/p&gt;

&lt;p&gt;Igor Pletnev on the InChI mailing list (10 January 2008, titled "InChI standard generation options (request for comments)") suggested&lt;/p&gt;

&lt;p&gt;/FixedH /RecMet /SPXYZ /SAsXYZ /Newps /Fb /Fnud&lt;/p&gt;

&lt;p&gt;but ChemSpider doesn't use /FixedH, making it hard to do an InChiKey search on their site.&lt;/p&gt;

&lt;p&gt;At the very least, you should enable the /Fb ("Fix bug leading to missing or undefined sp3 parity") option.&lt;/p&gt;

&lt;p&gt;Also, I see you are passing the entire contents of the input file into the string to be exec'ed by Ruby.  I don't know how Ruby works, but I would be worried about embedded quotes in the molfile string.  What if some property contained a " character, causing the quoted string to be unquoted?  If you used this as part of a web service to convert structures to InChI strings then this is a possible security hole.&lt;/p&gt;

&lt;p&gt;I'm also slightly concerned about the potential size of the string you create.  On my Mac, the maximum string is set by the kernel variable "kern.argmax" = 262144, so that's the upper limit on the structure you could pass in.  Doing a quick search I found that inulin (CID:24763) has 801 heavy atoms and is 96,669 bytes long.  This means it's very unlikely that the input structure will exceed that limit.&lt;/p&gt;

&lt;p&gt;But other machines have different limits.  See for example &lt;a href="http://www.in-ulm.de/~mascheck/various/argmax/" rel="nofollow"&gt;http://www.in-ulm.de/~mascheck/various/argmax/&lt;/a&gt; .  For those people using IRIX (which is where I first ran into this limit), the max size is only 20,480 bytes.  "Linux -2.6.7" is listed at 131,072 bytes, so it's possible that some very large files found in the wild will break that limit.&lt;/p&gt;

&lt;p&gt;My usual solution for this case is using something like Open3.popen3, which uses pipes to talk communicate with the co-process's stdin, stdout, and stderr.  This also solves the problem of keeping inchi's stderr from reaching the console.&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;irb(main):001:0&gt; require "open3"&lt;/p&gt;
    
    &lt;p&gt;=&gt; true&lt;/p&gt;
    
    &lt;p&gt;irb(main):002:0&gt; stdin, stdout, stderr = Open3.popen3("wc -c")&lt;/p&gt;
    
    &lt;p&gt;=&gt; [#, #, #]&lt;/p&gt;
    
    &lt;p&gt;irb(main):003:0&gt; stdin.write("Hello!")&lt;/p&gt;
    
    &lt;p&gt;=&gt; 6&lt;/p&gt;
    
    &lt;p&gt;irb(main):004:0&gt; stdin.close()&lt;/p&gt;
    
    &lt;p&gt;=&gt; nil&lt;/p&gt;
    
    &lt;p&gt;irb(main):005:0&gt; stdout.read()&lt;/p&gt;
    
    &lt;p&gt;=&gt; "       6\n"&lt;/p&gt;
    
    &lt;p&gt;irb(main):006:0&gt; &lt;/p&gt;
&lt;/blockquote&gt;</description>
      <pubDate>Sat, 31 May 2008 05:42:55 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:ec22fdcd-b7aa-4479-91c9-645c6711f4d4</guid>
      <link>http://depth-first.com/articles/2008/05/29/a-simple-and-portable-ruby-interface-to-inchi#comment-584</link>
    </item>
    <item>
      <title>"A Simple and Portable Ruby Interface to InChI" by Rich Apodaca</title>
      <description>&lt;p&gt;Noel, last time I checked CDK support for InChI was limited to Windows and 32-bit Intel.&lt;/p&gt;

&lt;p&gt;Rino actually uses neither CDK nor Open Babel, but rather the InChI toolkit directly. But my limited time and knowledge of C led to the use of temporary files in the C-Extension. And I doubt Rino would compile on OS X.&lt;/p&gt;</description>
      <pubDate>Fri, 30 May 2008 08:45:17 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:62ad269b-1741-49e9-925e-e0f628055a2b</guid>
      <link>http://depth-first.com/articles/2008/05/29/a-simple-and-portable-ruby-interface-to-inchi#comment-582</link>
    </item>
    <item>
      <title>"A Simple and Portable Ruby Interface to InChI" by baoilleach</title>
      <description>&lt;p&gt;Actually, I realise I don't currently support InChI with the CDK, so I don't know if what I said is actually correct.&lt;/p&gt;</description>
      <pubDate>Fri, 30 May 2008 05:26:39 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:2c93e25f-c4d9-44fd-abfc-968371e83cba</guid>
      <link>http://depth-first.com/articles/2008/05/29/a-simple-and-portable-ruby-interface-to-inchi#comment-581</link>
    </item>
    <item>
      <title>"A Simple and Portable Ruby Interface to InChI" by baoilleach</title>
      <description>&lt;p&gt;There's no need for files either if using the CDK [as Rino appears to ] or indeed if using OpenBabel. If interested in an example, see the code behind cinfony.cdkjython.readstring() and cinfony.cdkjython.Molecule.write().&lt;/p&gt;</description>
      <pubDate>Fri, 30 May 2008 05:24:28 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:b95c917a-505d-44da-842f-a64eee9ec39a</guid>
      <link>http://depth-first.com/articles/2008/05/29/a-simple-and-portable-ruby-interface-to-inchi#comment-580</link>
    </item>
  </channel>
</rss>
