A Simple and Portable Ruby Interface to InChI 8
Although the InChI software itself is written in C, it can still be used via Ruby. Rino offers one implementation of a Ruby InChI interface that makes use of a C extension. This article describes a more concise and portable solution.
The Code
The following code will accept a String encoding a molfile and return either its InChI, or an empty String if no InChI could be found:
module InChI
def inchi_for molfile
output = %x[echo "#{molfile}" | cInChI-1 -STDIO]
output.eql?("") ? "" : output.split(/\n/)[1]
end
endThis code takes advantage of Ruby's built-in support for Command Expansion.
Testing the Code
The code below tests the library:
require 'inchi'
include InChI
molfile =
"http://chempedia.com/compounds/106.mol
-OEChem-03010811072D
12 12 0 0 0 0 0 0 0999 V2000
2.8660 1.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0000 0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0000 -0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 -0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 -1.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 1.6200 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1.4631 0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
4.2690 0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1.4631 -0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
4.2690 -0.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 -1.6200 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
1 3 1 0 0 0 0
1 7 1 0 0 0 0
2 4 1 0 0 0 0
2 8 1 0 0 0 0
3 5 2 0 0 0 0
3 9 1 0 0 0 0
4 6 2 0 0 0 0
4 10 1 0 0 0 0
5 6 1 0 0 0 0
5 11 1 0 0 0 0
6 12 1 0 0 0 0
M END"
puts "Found InChI: #{inchi_for(molfile)}"We can run the test by saving it in a file called test.rb and executing it:
$ ruby test.rb InChI version 1, Software version 1.02-beta August 2007 Log file not specified. Using standard error output. Input file not specified. Using standard input. Output file not specified. Using standard output. Options: Mobile H Perception ON Isotopic ON, Absolute Stereo ON Omit undefined/unknown stereogenic centers and bonds Full Aux. info Input format: MOLfile Output format: Plain text Timeout per structure: 60.000 sec; Up to 1024 atoms per structure End of file detected after structure #1. Finished processing 1 structure: 0 errors, processing time 0:00:00.00 Found InChI: InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H
Prerequisites
The above approach only requires that it be run on a UNIX-like system, and that a copy of the InChI library be present on your path.
Advantages
The approach described here offers some important advantages over Rino:
It works without modification on both the Matz Ruby Interpreter (C-Ruby) and JRuby.
It neither creates nor uses files.
Disadvantages
This approach creates a lot of noisy log output to the console. There must be a way to suppress it, but so far I haven't found out how.
Conclusions
Using Ruby's support for Command Expansions has enabled the creation of a concise and portable Ruby interface to the InChI toolkit. Similar principles would apply to any Unix command-line binary, including for example, Open Babel.
Customize InChI Output with Rino
Rino is a toolkit for working with the IUPAC International Chemical Identifier (InChI) in Ruby. Because it's based on the IUPAC/NIST InChI toolkit, Rino can be configured using a variety of useful options. This article summarizes those options and provides an illustrative example.
Complete List of InChI Command Line Options
The following is a complete summary of the IUPAC/NIST InChI toolkit command line options:
SNon Exclude stereo (Default: Include Absolute stereo)
SRel Relative stereo
SRac Racemic stereo
SUCF Use Chiral Flag: On means Absolute stereo, Off - Relative
SUU Include omitted unknown/undefined stereo
NEWPS Narrow end of wedge points to stereocenter (default: both)
SPXYZ Include Phosphines Stereochemistry
SAsXYZ Include Arsines Stereochemistry
RecMet Include reconnected metals results
FixedH Mobile H Perception Off (Default: On)
AuxNone Omit auxiliary information (default: Include)
NoADP Disable Aggressive Deprotonation (for testing only)
Compress Compressed output
DoNotAddH Don't add H according to usual valences: all H are explicit
Wnumber Set time-out per structure in seconds; W0 means unlimited
SDF:DataHeader Read from the input SDfile the ID under this DataHeader
NoLabels Omit structure number, DataHeader and ID from InChI output
Tabbed Separate structure number, InChI, and AuxIndo with tabs
OutputSDF Convert InChI created with default aux. info to SDfile
InChI2InChI Convert InChI string into InChI string for validation purposes
SdfAtomsDT Output Hydrogen Isotopes to SDfile as Atoms D and T
STDIO Use standard input/output streams
FB (or FixSp3Bug) Fix bug leading to missing or undefined sp3 parity
WarnOnEmptyStructure Warn and produce empty InChI for empty structure
A Test
The following code displays the InChI for benzoic acid with and without mobile hydrogen atom perception. It requires both Rino and Ruby CDK. The latter library is used to convert a SMILES string into a molfile for use by Rino.
require 'rubygems'
require_gem 'rcdk'
require_gem 'rino'
require 'rcdk/util'
molfile=RCDK::Util::Lang.smiles_to_molfile 'c1ccccc1C(=O)O' # benzoic acid
reader = Rino::MolfileReader.new
inchi = reader.read(molfile)
puts "Without mobile hydrogen perception:\n#{inchi}\n\n"
reader.options << '-FixedH'
inchi = reader.read(molfile)
puts "With mobile hydrogen perception:\n#{inchi}"The -FixedH flag used by the reader the second time tells Rino to identify mobile hydrogens in the InChI output. Some InChI authors use this form of InChI and others don't. PubChem is an example of a large InChI author that does use mobile hydrogen perception, as their entry for benzoic acid demonstrates. To perform an exact match of your InChIs with theirs, the -FixedH flag must be set.
Running the Test
Running the test code produces the following output:
Without mobile hydrogen perception: InChI=1/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9) With mobile hydrogen perception: InChI=1/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)/f/h8H
Conclusions
When matching InChIs generated by other authors, it's best to adopt their processing conventions. Rino makes it conventient to do so through its full support for the standard IUPAC/NIST command line options.
Anatomy of a Cheminformatics Web Application: InChIMatic
InChI is an open molecular identifier system. Although InChIs obviate the need for a central registration authority, they are complex enough that they must be generated by computer. Currently, a few desktop molecular editors can generate InChI identifiers. But wouldn't it be more convenient if this capability existed in a simple Web application that could be used from any computer - anywhere? This article describes a Web application called "InChIMatic", which does just that.
In this article, I'll show how Java Molecular Editor (JME), a lightweight 2-D structure editor, can be extended to produce InChI identifiers through server-side software written in Ruby, rather than by extending the applet with Java code.
Downloads and Prerequisites
InChIMatic requires Ruby on Rails and the Rino InChI toolkit. Both of these libraries can be installed using the RubyGems packaging system.
The complete InChIMatic source package can be downloaded from RubyForge. For convenience, a copy of JME is included with the distribution. The author, Peter Ertl, has kindly given permission for the bundled JME applet to be used with InChIMatic. For other uses, consult the JME homepage.
Running InChIMatic
$ cd inchimatic-0.0.2 $ ruby script/server
Pointing your browser to http://localhost:3000/inchi/input, drawing a structure in the JME window, and pressing the "InChI!" button will produce the corresponding InChI in the area below.

Behind the Scenes
The JME applet itself provides no capabilities for generating InChI identifiers. This functionality is instead provided by the Rails server via the Rino InChI library.
Let's say Susan wants to get the InChI for 3,4-dichlorophenol. After entering the structure into the JME window, she presses the "InChI!" button. This sets in motion the following sequence of events:
The JavaScript function writeMolfile() is called. This retrieves a molfile representation of 3,4-dichlorophenol from JME, which is then written to to the hidden field molfile.
A Rails listener notices that the hidden text field has been updated and so invokes the InChIMatic ajax_inchi action. This is a Rails Ajax action that will update only a portion of the InChIMatic window. For more detail on this Rails Ajax technique, see the previous Anatomy of a Cheminformatics Web Application article.
The ajax_inchi action retrieves the contents of the hidden molfile field. This molfile is then used to generate an InChI using Rino. This InChI is then saved to the instance variable inchi.
The contents of the InChIMatic area partitioned by the results div are then updated with the InChI obtained in Step 3. The JME applet itself is unaffected by this operation, allowing Susan to further elaborate her molecule, if she'd like.
So What? Re-Thinking the Role of Applets
JME is, by itself, incapable of generating InChIs. Yet InChIMatic provides this capability as if it existed natively. In other words, a lightweight, fast-loading, and responsive 2-D editor can be extended on the server side, rather than on the client side. The difference is imperceptible to the user, but ripe with potential for the developer.
One of the most common, and completely valid, complaints about Java applets is that they take too long to load. Offloading some of the functionality currently being bundled in applets onto a Web server offers one way to combat the problem. Furthermore, combining Java applets with Ajax and powerful Web application frameworks like Ruby on Rails offers virtually limitless opportunities to re-think the role of Java applets in Web application development.
Conclusions
JME's strength comes, perhaps ironically, from its limited functionality. By using some simple Web programming techniques, JME can be extended with server-side programming. The advantages, compared to extending the JME applet itself with Java on the client side, are significant. Future articles in this series will explore some of the possibilities.
Looking at InChIs
InChI identifiers can be viewed both as unique molecular keys and as a language encoding molecular structure. With the right software, it is possible to decode any InChI to arrive at a human-readable molecular structure. This tutorial will show how to convert InChI identifiers into 2-D molecular renderings using open source tools.
Prerequisites
The InChI to 2-D image conversion process requires two pieces of software:
Rino decodes InChI identifiers into molfiles. The resulting atomic coordinates are set to zero.
RCDK assigns coordinates to the molfile produced by Rino, and renders the result.
Bring on the Code
The following Ruby code illustrates how the InChI for the pesticide fipronil (Regent) can be translated into a PNG image:
require 'rubygems'
require_gem 'rino'
require_gem 'rcdk'
require 'util'
inchi = 'InChI=1/C12H4Cl2F6N4OS/c13-5-1-4(11(15,16)17)2-6(14)8(5)24-10(22)9(7(3-21)23-24)26(25)12(18,19)20/h1-2H,22H2' #fipronil
reader = Rino::InChIReader.new
molfile1 = reader.read(inchi) # lacks 2-D atomic coordinates
molfile2 = RCDK::Util::XY.coordinate_molfile(molfile1) # has 2-D atomic coordinates
RCDK::Util::Image.molfile_to_png(molfile2, 'fipronil.png', 350, 300)Running this code produces the image fipronil.png in your working directory:

Limitations
The technique illustrated here is subject to the same limitations as the underlying software. For Rino, this means that stereochemistry is ignored. For RCDK, this means that implicit hydrogen atoms, isotopes, and charges are omitted, and that layout of macrocycles and other complex ring systems may not subjectively appear very refined.
Other Software that Does This
To my knowledge, only one other Open Source package, BKChem, is capable of rendering InChIs as described here. BKChem's underlying InChI translation and depiction software, OASA, can also be accessed online. For comparison, OASA produces the following image for for the fipronil InChI:

The PubChem editor can also translate and render InChIs, but no source code appears to be available. PubChem's InChI translation and rendering output for fipronil is:

The Chemistry Development Kit, on which RCDK is based, was recently upgraded to support reading InChI identifiers. For some time, CDK has been able to generate 2-D atomic coordinates.
More information on InChI software can be found at Beda Kosata's InChI.info site.
The Final Word
Within certain limitations, it is quite feasible to programatically obtain a 2-D molecular image for any InChI identifier. Combining this capability with other chemical informatics software and services offers numerous possibilities to use InChI in innovative ways.
Decoding InChIs with Rino
InChI identifiers are unique, ASCII-based molecular identifiers well-suited for chemical informatics on the Web. But they are also much more than that. Encoded in every InChI is all of the information needed to reconstruct a valid, machine-readable molecular representation. This tutorial shows how Open Source tools can be used to construct a molfile representation from an InChI identifier with the help of new features in the Rino toolkit for Ruby. The ability of Rino to produce InChI identifiers from molfile input has already been discussed.
Credits
What follows was in part inspired by helpful comments posted by Sam Adams, author of the JNI InChI Wrapper, and Dmitrii Tchekhovskoi, co-author of the InChI software.
A Demo with cInChI
The newest release of the IUPAC InChI-API toolkit can now translate an InChI identifier into a molfile. This consists of a two-step process:
- Convert a simple InChI into a full InChI with Auxiliary Information (AuxInfo).
- Convert the full InChI into a molfile.
You can get a feel for how this process works by using the cInChI command-line program. Create a file called test.txt containing the following InChI (for benzene):
InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H$ touch temp.txt $ ./cInChI-1 test.txt temp.txt -InChI2Struct
The first line creates an empty temporary file, temp.txt. Into this file is written the full InChI as output. The -InChI2Struct parameter tells InChI to generate an InChI with Auxiliary Information.
Now, create an empty file, benzene.mol and run cInChI with the -OutputSDF option:
$ touch benzene.mol $ ./cInChI-1 temp.txt benzene.mol -OutputSDF
If everything worked, you should now have a molfile called benzene.mol, describing benzene, in your working directory. All atom coordinates will be zero, because coordinate generation is outside the scope of the InChI project. This has important implications or stereochemistry (see below). Of course, other free libraries can generate aesthetically-pleasing 2-D molecular coordinates.
Hello, Rino
Rino is a thin Ruby wrapper around the InChI-API toolkit, which is written in C. An earlier article described the use of the automatic wrapper generator SWIG to write the C glue code that Rino interfaces with. The current version of Rino (v0.2.0) uses this approach to Ruby interface generation.
The current version of Rino can conveniently be installed by executing the following (as root):
# gem install rino
Earlier today, I got "404 Not Found" errors for this command, but not recently. The source is not clear, but seems to occur within the 24 hours after the Gem is uploaded. If you run into problems, the Rino RubyGem can also be downloaded and installed locally.
If you've already installed Rino-0.1.0, the new version can happily cohabitate with it. RubyGems by default installs the most recent version of Rino unless you specify otherwise. If you'd like to uninstall Rino-0.1.0 do the following (as root):
# gem uninstall rino
You should get a menu of Rino version to uninstall.
A Ruby Demo
The following Ruby code demonstrates the use of Rino to translate an InChI identifier into a molfile:
require 'rubygems'
require_gem 'rino'
inchi = 'InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H' # benzene
reader = Rino::InChIReader.new
molfile = reader.read(inchi)
p molfile # => prints the molfile for benzeneIf you'd like even more control, you can directly access the InChI run method, which provides all of the capabilities of running cInChI from the command line:
require 'rubygems'
require_gem 'rino'
input = 'input.txt' # a valid file in your working dir
output = 'output.txt' # also a valid file
Rino::InChI.run(['', input, output])Limitations
The InChI->molfile implementation in the InChI-API toolkit does not reproduce stereochemical information. For example, passing an InChI of a molecule containing a single tetrahedral stereocenter results in a molfile lacking stereo parities. Further, an explicit hydrogen atom is added to the sterogenic atom in the molfile output. Being based entirely on the InChI-API, Rino inherits these behaviors.
Rino is based on a very simple interface into InChI's main method. This has the advantage that anything that can be done with the cInChI command line application can also be done with Rino. It carries the disadvantage that the convenience classes InChIReader and MolfileReader use a less than elegant system of temporary disk files for input-output. Future versions of Rino should address this issue, a task that may be simplified by SWIG.
Other InChI Parsers
To my knowledge, three Open Source InChI parsers, besides the InChI-API and Rino, exist. They are:
Ninja. A Java library that performs low-level InChI parsing, and is designed as a platform for more sophisticated parsers. While it does not create molfiles from InChIs, it can be used as a foundation for software that does. Ninja is used in the molecular language framework, Rosetta, although this work is far from complete.
BKChem. Beda Kosata's 2-D structure editor, which is written in Python. The similarities between Ruby and Python make this codebase a potentially useful starting point for a pure Ruby InChI parser.
JNI InChI Wrapper. Also a wrapper for the InChI-API. When used in combination with the Chemistry Development Kit, this package has been reported to produce molfiles from InChI identifiers.
More information on InChI software capabilities can be found at Beda Kosata's InChI info site.
Wrapping Up
The translation of InChI identifiers into other molecular representation systems will become more important as InChI gains traction. Mashups involving InChI translation offer many tantalizing opportunities for innovative chemical informatics applications. Future articles will discuss some of them.
Older posts: 1 2

