A Simple and Portable Ruby Interface to InChI 8

Posted by Rich Apodaca Thu, 29 May 2008 16:12:00 GMT

Although the InChI software itself is written in C, it can still be used via Ruby. Rino offers one implementation of a Ruby InChI interface that makes use of a C extension. This article describes a more concise and portable solution.

The Code

The following code will accept a String encoding a molfile and return either its InChI, or an empty String if no InChI could be found:

module InChI
  def inchi_for molfile
    output = %x[echo "#{molfile}" | cInChI-1 -STDIO]

    output.eql?("") ? "" : output.split(/\n/)[1]
  end
end

This code takes advantage of Ruby's built-in support for Command Expansion.

Testing the Code

The code below tests the library:

require 'inchi'
include InChI

molfile =
"http://chempedia.com/compounds/106.mol
  -OEChem-03010811072D

 12 12  0     0  0  0  0  0  0999 V2000
    2.8660    1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.0000    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.0000   -0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321   -0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660   -1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660    1.6200    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.4631    0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.2690    0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.4631   -0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.2690   -0.8100    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660   -1.6200    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  1  3  1  0  0  0  0
  1  7  1  0  0  0  0
  2  4  1  0  0  0  0
  2  8  1  0  0  0  0
  3  5  2  0  0  0  0
  3  9  1  0  0  0  0
  4  6  2  0  0  0  0
  4 10  1  0  0  0  0
  5  6  1  0  0  0  0
  5 11  1  0  0  0  0
  6 12  1  0  0  0  0
M  END"

puts "Found InChI: #{inchi_for(molfile)}"

We can run the test by saving it in a file called test.rb and executing it:

$ ruby test.rb
InChI version 1, Software version 1.02-beta August 2007
Log file not specified. Using standard error output.
Input file not specified. Using standard input.
Output file not specified. Using standard output.
Options: Mobile H Perception ON
Isotopic ON, Absolute Stereo ON
Omit undefined/unknown stereogenic centers and bonds
Full Aux. info
Input format: MOLfile
Output format: Plain text
Timeout per structure: 60.000 sec; Up to 1024 atoms per structure
End of file detected after structure #1.
Finished processing 1 structure: 0 errors, processing time 0:00:00.00
Found InChI: InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H

Prerequisites

The above approach only requires that it be run on a UNIX-like system, and that a copy of the InChI library be present on your path.

Advantages

The approach described here offers some important advantages over Rino:

Disadvantages

This approach creates a lot of noisy log output to the console. There must be a way to suppress it, but so far I haven't found out how.

Conclusions

Using Ruby's support for Command Expansions has enabled the creation of a concise and portable Ruby interface to the InChI toolkit. Similar principles would apply to any Unix command-line binary, including for example, Open Babel.

Customize InChI Output with Rino

Posted by Rich Apodaca Mon, 19 Mar 2007 14:30:00 GMT

Rino is a toolkit for working with the IUPAC International Chemical Identifier (InChI) in Ruby. Because it's based on the IUPAC/NIST InChI toolkit, Rino can be configured using a variety of useful options. This article summarizes those options and provides an illustrative example.

Complete List of InChI Command Line Options

The following is a complete summary of the IUPAC/NIST InChI toolkit command line options:

  • SNon Exclude stereo (Default: Include Absolute stereo)

  • SRel Relative stereo

  • SRac Racemic stereo

  • SUCF Use Chiral Flag: On means Absolute stereo, Off - Relative

  • SUU Include omitted unknown/undefined stereo

  • NEWPS Narrow end of wedge points to stereocenter (default: both)

  • SPXYZ Include Phosphines Stereochemistry

  • SAsXYZ Include Arsines Stereochemistry

  • RecMet Include reconnected metals results

  • FixedH Mobile H Perception Off (Default: On)

  • AuxNone Omit auxiliary information (default: Include)

  • NoADP Disable Aggressive Deprotonation (for testing only)

  • Compress Compressed output

  • DoNotAddH Don't add H according to usual valences: all H are explicit

  • Wnumber Set time-out per structure in seconds; W0 means unlimited

  • SDF:DataHeader Read from the input SDfile the ID under this DataHeader

  • NoLabels Omit structure number, DataHeader and ID from InChI output

  • Tabbed Separate structure number, InChI, and AuxIndo with tabs

  • OutputSDF Convert InChI created with default aux. info to SDfile

  • InChI2InChI Convert InChI string into InChI string for validation purposes

  • SdfAtomsDT Output Hydrogen Isotopes to SDfile as Atoms D and T

  • STDIO Use standard input/output streams

  • FB (or FixSp3Bug) Fix bug leading to missing or undefined sp3 parity

  • WarnOnEmptyStructure Warn and produce empty InChI for empty structure

A Test

The following code displays the InChI for benzoic acid with and without mobile hydrogen atom perception. It requires both Rino and Ruby CDK. The latter library is used to convert a SMILES string into a molfile for use by Rino.

require 'rubygems'
require_gem 'rcdk'
require_gem 'rino'
require 'rcdk/util'

molfile=RCDK::Util::Lang.smiles_to_molfile 'c1ccccc1C(=O)O' # benzoic acid
reader = Rino::MolfileReader.new
inchi = reader.read(molfile)

puts "Without mobile hydrogen perception:\n#{inchi}\n\n"

reader.options << '-FixedH'
inchi = reader.read(molfile)

puts "With mobile hydrogen perception:\n#{inchi}"

The -FixedH flag used by the reader the second time tells Rino to identify mobile hydrogens in the InChI output. Some InChI authors use this form of InChI and others don't. PubChem is an example of a large InChI author that does use mobile hydrogen perception, as their entry for benzoic acid demonstrates. To perform an exact match of your InChIs with theirs, the -FixedH flag must be set.

Running the Test

Running the test code produces the following output:

Without mobile hydrogen perception:
InChI=1/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)

With mobile hydrogen perception:
InChI=1/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)/f/h8H

Conclusions

When matching InChIs generated by other authors, it's best to adopt their processing conventions. Rino makes it conventient to do so through its full support for the standard IUPAC/NIST command line options.

Anatomy of a Cheminformatics Web Application: InChIMatic

Posted by Rich Apodaca Fri, 15 Dec 2006 20:49:00 GMT

InChI is an open molecular identifier system. Although InChIs obviate the need for a central registration authority, they are complex enough that they must be generated by computer. Currently, a few desktop molecular editors can generate InChI identifiers. But wouldn't it be more convenient if this capability existed in a simple Web application that could be used from any computer - anywhere? This article describes a Web application called "InChIMatic", which does just that.

In this article, I'll show how Java Molecular Editor (JME), a lightweight 2-D structure editor, can be extended to produce InChI identifiers through server-side software written in Ruby, rather than by extending the applet with Java code.

Downloads and Prerequisites

InChIMatic requires Ruby on Rails and the Rino InChI toolkit. Both of these libraries can be installed using the RubyGems packaging system.

The complete InChIMatic source package can be downloaded from RubyForge. For convenience, a copy of JME is included with the distribution. The author, Peter Ertl, has kindly given permission for the bundled JME applet to be used with InChIMatic. For other uses, consult the JME homepage.

Running InChIMatic

$ cd inchimatic-0.0.2
$ ruby script/server

Pointing your browser to http://localhost:3000/inchi/input, drawing a structure in the JME window, and pressing the "InChI!" button will produce the corresponding InChI in the area below.

Behind the Scenes

The JME applet itself provides no capabilities for generating InChI identifiers. This functionality is instead provided by the Rails server via the Rino InChI library.

Let's say Susan wants to get the InChI for 3,4-dichlorophenol. After entering the structure into the JME window, she presses the "InChI!" button. This sets in motion the following sequence of events:

  1. The JavaScript function writeMolfile() is called. This retrieves a molfile representation of 3,4-dichlorophenol from JME, which is then written to to the hidden field molfile.

  2. A Rails listener notices that the hidden text field has been updated and so invokes the InChIMatic ajax_inchi action. This is a Rails Ajax action that will update only a portion of the InChIMatic window. For more detail on this Rails Ajax technique, see the previous Anatomy of a Cheminformatics Web Application article.

  3. The ajax_inchi action retrieves the contents of the hidden molfile field. This molfile is then used to generate an InChI using Rino. This InChI is then saved to the instance variable inchi.

  4. The contents of the InChIMatic area partitioned by the results div are then updated with the InChI obtained in Step 3. The JME applet itself is unaffected by this operation, allowing Susan to further elaborate her molecule, if she'd like.

So What? Re-Thinking the Role of Applets

JME is, by itself, incapable of generating InChIs. Yet InChIMatic provides this capability as if it existed natively. In other words, a lightweight, fast-loading, and responsive 2-D editor can be extended on the server side, rather than on the client side. The difference is imperceptible to the user, but ripe with potential for the developer.

One of the most common, and completely valid, complaints about Java applets is that they take too long to load. Offloading some of the functionality currently being bundled in applets onto a Web server offers one way to combat the problem. Furthermore, combining Java applets with Ajax and powerful Web application frameworks like Ruby on Rails offers virtually limitless opportunities to re-think the role of Java applets in Web application development.

Conclusions

JME's strength comes, perhaps ironically, from its limited functionality. By using some simple Web programming techniques, JME can be extended with server-side programming. The advantages, compared to extending the JME applet itself with Java on the client side, are significant. Future articles in this series will explore some of the possibilities.

Looking at InChIs

Posted by Rich Apodaca Tue, 26 Sep 2006 18:35:00 GMT

InChI identifiers can be viewed both as unique molecular keys and as a language encoding molecular structure. With the right software, it is possible to decode any InChI to arrive at a human-readable molecular structure. This tutorial will show how to convert InChI identifiers into 2-D molecular renderings using open source tools.

Prerequisites

The InChI to 2-D image conversion process requires two pieces of software:

  • Rino decodes InChI identifiers into molfiles. The resulting atomic coordinates are set to zero.

  • RCDK assigns coordinates to the molfile produced by Rino, and renders the result.

Bring on the Code

The following Ruby code illustrates how the InChI for the pesticide fipronil (Regent) can be translated into a PNG image:

require 'rubygems'
require_gem 'rino'
require_gem 'rcdk'
require 'util'

inchi = 'InChI=1/C12H4Cl2F6N4OS/c13-5-1-4(11(15,16)17)2-6(14)8(5)24-10(22)9(7(3-21)23-24)26(25)12(18,19)20/h1-2H,22H2' #fipronil
reader = Rino::InChIReader.new
molfile1 = reader.read(inchi) # lacks 2-D atomic coordinates
molfile2 = RCDK::Util::XY.coordinate_molfile(molfile1) # has 2-D atomic coordinates

RCDK::Util::Image.molfile_to_png(molfile2, 'fipronil.png', 350, 300)

Running this code produces the image fipronil.png in your working directory:

Limitations

The technique illustrated here is subject to the same limitations as the underlying software. For Rino, this means that stereochemistry is ignored. For RCDK, this means that implicit hydrogen atoms, isotopes, and charges are omitted, and that layout of macrocycles and other complex ring systems may not subjectively appear very refined.

Other Software that Does This

To my knowledge, only one other Open Source package, BKChem, is capable of rendering InChIs as described here. BKChem's underlying InChI translation and depiction software, OASA, can also be accessed online. For comparison, OASA produces the following image for for the fipronil InChI:

The PubChem editor can also translate and render InChIs, but no source code appears to be available. PubChem's InChI translation and rendering output for fipronil is:

The Chemistry Development Kit, on which RCDK is based, was recently upgraded to support reading InChI identifiers. For some time, CDK has been able to generate 2-D atomic coordinates.

More information on InChI software can be found at Beda Kosata's InChI.info site.

The Final Word

Within certain limitations, it is quite feasible to programatically obtain a 2-D molecular image for any InChI identifier. Combining this capability with other chemical informatics software and services offers numerous possibilities to use InChI in innovative ways.

Decoding InChIs with Rino

Posted by Rich Apodaca Tue, 19 Sep 2006 18:07:00 GMT

InChI identifiers are unique, ASCII-based molecular identifiers well-suited for chemical informatics on the Web. But they are also much more than that. Encoded in every InChI is all of the information needed to reconstruct a valid, machine-readable molecular representation. This tutorial shows how Open Source tools can be used to construct a molfile representation from an InChI identifier with the help of new features in the Rino toolkit for Ruby. The ability of Rino to produce InChI identifiers from molfile input has already been discussed.

Credits

What follows was in part inspired by helpful comments posted by Sam Adams, author of the JNI InChI Wrapper, and Dmitrii Tchekhovskoi, co-author of the InChI software.

A Demo with cInChI

The newest release of the IUPAC InChI-API toolkit can now translate an InChI identifier into a molfile. This consists of a two-step process:

  1. Convert a simple InChI into a full InChI with Auxiliary Information (AuxInfo).
  2. Convert the full InChI into a molfile.

You can get a feel for how this process works by using the cInChI command-line program. Create a file called test.txt containing the following InChI (for benzene):

InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H
Now, run cInChI:
$ touch temp.txt
$ ./cInChI-1 test.txt temp.txt -InChI2Struct

The first line creates an empty temporary file, temp.txt. Into this file is written the full InChI as output. The -InChI2Struct parameter tells InChI to generate an InChI with Auxiliary Information.

Now, create an empty file, benzene.mol and run cInChI with the -OutputSDF option:

$ touch benzene.mol
$ ./cInChI-1 temp.txt benzene.mol -OutputSDF

If everything worked, you should now have a molfile called benzene.mol, describing benzene, in your working directory. All atom coordinates will be zero, because coordinate generation is outside the scope of the InChI project. This has important implications or stereochemistry (see below). Of course, other free libraries can generate aesthetically-pleasing 2-D molecular coordinates.

Hello, Rino

Rino is a thin Ruby wrapper around the InChI-API toolkit, which is written in C. An earlier article described the use of the automatic wrapper generator SWIG to write the C glue code that Rino interfaces with. The current version of Rino (v0.2.0) uses this approach to Ruby interface generation.

The current version of Rino can conveniently be installed by executing the following (as root):

# gem install rino

Earlier today, I got "404 Not Found" errors for this command, but not recently. The source is not clear, but seems to occur within the 24 hours after the Gem is uploaded. If you run into problems, the Rino RubyGem can also be downloaded and installed locally.

If you've already installed Rino-0.1.0, the new version can happily cohabitate with it. RubyGems by default installs the most recent version of Rino unless you specify otherwise. If you'd like to uninstall Rino-0.1.0 do the following (as root):

# gem uninstall rino

You should get a menu of Rino version to uninstall.

A Ruby Demo

The following Ruby code demonstrates the use of Rino to translate an InChI identifier into a molfile:

require 'rubygems'
require_gem 'rino'

inchi = 'InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H' # benzene
reader = Rino::InChIReader.new
molfile = reader.read(inchi)

p molfile # => prints the molfile for benzene

If you'd like even more control, you can directly access the InChI run method, which provides all of the capabilities of running cInChI from the command line:

require 'rubygems'
require_gem 'rino'

input = 'input.txt'   # a valid file in your working dir
output = 'output.txt' # also a valid file

Rino::InChI.run(['', input, output])

Limitations

The InChI->molfile implementation in the InChI-API toolkit does not reproduce stereochemical information. For example, passing an InChI of a molecule containing a single tetrahedral stereocenter results in a molfile lacking stereo parities. Further, an explicit hydrogen atom is added to the sterogenic atom in the molfile output. Being based entirely on the InChI-API, Rino inherits these behaviors.

Rino is based on a very simple interface into InChI's main method. This has the advantage that anything that can be done with the cInChI command line application can also be done with Rino. It carries the disadvantage that the convenience classes InChIReader and MolfileReader use a less than elegant system of temporary disk files for input-output. Future versions of Rino should address this issue, a task that may be simplified by SWIG.

Other InChI Parsers

To my knowledge, three Open Source InChI parsers, besides the InChI-API and Rino, exist. They are:

  • Ninja. A Java library that performs low-level InChI parsing, and is designed as a platform for more sophisticated parsers. While it does not create molfiles from InChIs, it can be used as a foundation for software that does. Ninja is used in the molecular language framework, Rosetta, although this work is far from complete.

  • BKChem. Beda Kosata's 2-D structure editor, which is written in Python. The similarities between Ruby and Python make this codebase a potentially useful starting point for a pure Ruby InChI parser.

  • JNI InChI Wrapper. Also a wrapper for the InChI-API. When used in combination with the Chemistry Development Kit, this package has been reported to produce molfiles from InChI identifiers.

More information on InChI software capabilities can be found at Beda Kosata's InChI info site.

Wrapping Up

The translation of InChI identifiers into other molecular representation systems will become more important as InChI gains traction. Mashups involving InChI translation offer many tantalizing opportunities for innovative chemical informatics applications. Future articles will discuss some of them.

Older posts: 1 2