A Simple and Portable Ruby Interface to InChI 8
Although the InChI software itself is written in C, it can still be used via Ruby. Rino offers one implementation of a Ruby InChI interface that makes use of a C extension. This article describes a more concise and portable solution.
The Code
The following code will accept a String encoding a molfile and return either its InChI, or an empty String if no InChI could be found:
module InChI
def inchi_for molfile
output = %x[echo "#{molfile}" | cInChI-1 -STDIO]
output.eql?("") ? "" : output.split(/\n/)[1]
end
endThis code takes advantage of Ruby's built-in support for Command Expansion.
Testing the Code
The code below tests the library:
require 'inchi'
include InChI
molfile =
"http://chempedia.com/compounds/106.mol
-OEChem-03010811072D
12 12 0 0 0 0 0 0 0999 V2000
2.8660 1.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0000 0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0000 -0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 -0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 -1.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 1.6200 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1.4631 0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
4.2690 0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1.4631 -0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
4.2690 -0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 -1.6200 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
1 3 1 0 0 0 0
1 7 1 0 0 0 0
2 4 1 0 0 0 0
2 8 1 0 0 0 0
3 5 2 0 0 0 0
3 9 1 0 0 0 0
4 6 2 0 0 0 0
4 10 1 0 0 0 0
5 6 1 0 0 0 0
5 11 1 0 0 0 0
6 12 1 0 0 0 0
M END"
puts "Found InChI: #{inchi_for(molfile)}"We can run the test by saving it in a file called test.rb and executing it:
$ ruby test.rb InChI version 1, Software version 1.02-beta August 2007 Log file not specified. Using standard error output. Input file not specified. Using standard input. Output file not specified. Using standard output. Options: Mobile H Perception ON Isotopic ON, Absolute Stereo ON Omit undefined/unknown stereogenic centers and bonds Full Aux. info Input format: MOLfile Output format: Plain text Timeout per structure: 60.000 sec; Up to 1024 atoms per structure End of file detected after structure #1. Finished processing 1 structure: 0 errors, processing time 0:00:00.00 Found InChI: InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H
Prerequisites
The above approach only requires that it be run on a UNIX-like system, and that a copy of the InChI library be present on your path.
Advantages
The approach described here offers some important advantages over Rino:
It works without modification on both the Matz Ruby Interpreter (C-Ruby) and JRuby.
It neither creates nor uses files.
Disadvantages
This approach creates a lot of noisy log output to the console. There must be a way to suppress it, but so far I haven't found out how.
Conclusions
Using Ruby's support for Command Expansions has enabled the creation of a concise and portable Ruby interface to the InChI toolkit. Similar principles would apply to any Unix command-line binary, including for example, Open Babel.


There's no need for files either if using the CDK [as Rino appears to ] or indeed if using OpenBabel. If interested in an example, see the code behind cinfony.cdkjython.readstring() and cinfony.cdkjython.Molecule.write().
Actually, I realise I don't currently support InChI with the CDK, so I don't know if what I said is actually correct.
Noel, last time I checked CDK support for InChI was limited to Windows and 32-bit Intel.
Rino actually uses neither CDK nor Open Babel, but rather the InChI toolkit directly. But my limited time and knowledge of C led to the use of temporary files in the C-Extension. And I doubt Rino would compile on OS X.
You are missing some required command-line parameters (notice that I don't say "options", because they aren't optional).
Igor Pletnev on the InChI mailing list (10 January 2008, titled "InChI standard generation options (request for comments)") suggested
/FixedH /RecMet /SPXYZ /SAsXYZ /Newps /Fb /Fnud
but ChemSpider doesn't use /FixedH, making it hard to do an InChiKey search on their site.
At the very least, you should enable the /Fb ("Fix bug leading to missing or undefined sp3 parity") option.
Also, I see you are passing the entire contents of the input file into the string to be exec'ed by Ruby. I don't know how Ruby works, but I would be worried about embedded quotes in the molfile string. What if some property contained a " character, causing the quoted string to be unquoted? If you used this as part of a web service to convert structures to InChI strings then this is a possible security hole.
I'm also slightly concerned about the potential size of the string you create. On my Mac, the maximum string is set by the kernel variable "kern.argmax" = 262144, so that's the upper limit on the structure you could pass in. Doing a quick search I found that inulin (CID:24763) has 801 heavy atoms and is 96,669 bytes long. This means it's very unlikely that the input structure will exceed that limit.
But other machines have different limits. See for example http://www.in-ulm.de/~mascheck/various/argmax/ . For those people using IRIX (which is where I first ran into this limit), the max size is only 20,480 bytes. "Linux -2.6.7" is listed at 131,072 bytes, so it's possible that some very large files found in the wild will break that limit.
My usual solution for this case is using something like Open3.popen3, which uses pipes to talk communicate with the co-process's stdin, stdout, and stderr. This also solves the problem of keeping inchi's stderr from reaching the console.
Andrew, thanks for the feedback. The email thread you refer to has some useful information for those who want to share their InChIs with others. I've never thought that exchanging InChI keys between organizations would work well precisely for this reason.
Great suggestion about Open3. Unfortunately, it has issues on JRuby (1.1.1):
$ jirb irb(main):001:0> require 'open3' => true irb(main):002:0> stdin, stdout, stderr = Open3.popen3("wc -c") NotImplementedError: fork is unsafe and disabled by default on JRuby from (irb):3:in `popen3' from (irb):3:in `load_history' irb(main):003:0>I did a test to check if an unclosed quote placed on the comments line would cause problems, and it does. An unclosed single quote was fine, though. A workaround would be:
AFAIK, doing this doesn't change the resulting InChI in any way.
A solution to the noisy output problem was also described here.
So it looks like for now the method described above using Command Expansion is the most broadly-usable. For extremely large molfiles, this could be a problem, but for everything else it seems to work.
I've not been silent with my own complaints about InChI. :)
"fork" is deemed unsafe? Strange. Are they worried about fork-bombs or something else? I'm more surprised that %x[] works because it has a lot of security problems. I pointed out the one with the quotes. You fixed it via removal of any double quotes, which should be fine for InChI. But you also need to remove "\" characters, which are interpreted as escape character in the context you're using. And because you are using double quotes, other characters, like $, and ` have meaning.
Consider:
irb(main):011:0> text = "Bad\\" => "Bad\\" irb(main):013:0> %x[echo "#{text}" | wc -c] sh: -c: line 1: unexpected EOF while looking for matching `"' sh: -c: line 2: syntax error: unexpected end of file => "" irb(main):014:0> text = "$$" => "$$" irb(main):015:0> %x[echo "#{text}"] => "3749\n" irb(main):016:0> text = "`ls`" => "`ls`" irb(main):017:0> %x[echo #{text} | wc -c] => " 132\n"At the very least, use single quotes. And hope this code isn't run under Windows because the failure mode is going to be quite unexpected.It's really hard to be safe when using system(3C), which appears to be what Ruby is using here. I don't know Ruby well so don't know if a correct (and platform independent) function exists there. I did find the Escape module which has the right function.
Doing research now, under Java/JRuby your best choice is to use Runtime.exec() (pre 1.5) or java.lang.ProcessBuilder .
I usually use the "redirect stderr to /dev/null" trick when I do this sort of work, but sometimes I need the stderr output so I can get the program version number, or error message about why something failed.
The moral of my story is, don't trust user input, especially when passed to the command line.
Andrew, you raise some very important security issues. The risk is that Ruby code using Command Expansion could be coaxed into executing arbitrary system commands, rather than just generating an error.
A regex cleanup like this might work:
Single-quoting alone would ensure that the only substitutions that could occur are \\ -> \ and \' -> ', the latter of which causes an error.
Your point is well taken - any use of this approach on foreign data should carefully consider all of the security implications.
Yes, whitelisting like that should work for this case, in the manner you have there. There's still a worry in my head if someone submits a multi-structure SD file, since cInChI-1 will convert all structures you give it. Your filter removes the "> <" key/value data fields from the SD file, and I don't know if that will confuse the inchi reader.
I don't like being worried like this, so I do my best to avoid passing user input in on the command line.