From SMILES to InChI: Rino, CDK, and Ruby Java Bridge
Integrating Ruby and Java is fast and easy with Ruby Java Bridge (RJB), which was discussed previously. In this article, I'll show how RJB can be used to solve a practical chemical informatics problem - the conversion of SMILES strings into InChI identifiers.
Prerequisites
This tutorial is aimed at Linux users, but you should be able to accomplish the same thing in Windows and Mac OS X, although these systems have not been tested. You'll need to install a few software packages if you haven't done so already: Ruby; Ruby Gems; RJB; CDK; and Rino. After installing RubyGems, RJB and Rino can both be installed from the command line (as root):
# gem install rjb # gem install rino
Next, create a working directory, smi2inchi. Into this directory, move a copy of the full CDK-2006714 jarfile. That's it for libraries, so let's move onto the translator itself.
The Translator
The Translator class consists of a small piece of Ruby code gluing CDK's SmilesParser and MDLWriter with the Ruby InChI library Rino. Rino is a thin Ruby wrapper around the IUPAC InChI library, which is in turn written in C.
ENV['CLASSPATH'] = './cdk-20060714.jar'
require 'rubygems'
require_gem 'rjb'
require_gem 'rino'
require 'rjb'
StringWriter = Rjb::import 'java.io.StringWriter'
SmilesParser = Rjb::import 'org.openscience.cdk.smiles.SmilesParser'
MDLWriter = Rjb::import 'org.openscience.cdk.io.MDLWriter'
# Converts a SMILES string into an InChI identifier using
# the CDK Library (Java) and the Rino Library (Ruby/C).
class Translator
def initialize
@smiles_parser = SmilesParser.new
@mdl_writer = MDLWriter.new
@mol2inchi = Rino::MolfileReader.new
end
# Returns an InChI identifier from the specified SMILES string.
# Uses the CDK classes SmilesParser and MDLWriter to generate
# a molfile from a SMILES string. Then this molfile is
# parsed by Rino::MolfileReader.
def translate(smiles)
mol = @smiles_parser.parseSmiles(smiles)
sw = StringWriter.new
@mdl_writer.setWriter(sw)
@mdl_writer.write(mol)
@mol2inchi.read(sw.toString)
end
end
require 'smi2inchi'
translator = Translator.new
inchi = translator.translate 'c1ccccc1'
p inchi # => "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"$ ruby test.rb
Alternatively, it can be entered interactively with the Interactive Ruby Interpreter (irb):
$ irb irb(main):001:0>
With just a few lines of Ruby, we've solved a real problem. This example integrates software from three different programming languages: Ruby, C, and Java. Given the variety of chemical informatics software written in these languages, Ruby Java Bridge offers numerous integration possibilities.
107 Years of Line-Formula Notations (1861-1968)
L E5 B666 FVTJ A1 E1 OQ
Thus, within the short period of just seven years after the birth of structural chemistry in 1861, virtually all of the main ideas relating to line-formula descriptions were conceived and published. No basically new practices appeared for some 79 years. Then, within an identically brief period of just seven years (1947-1954), virtually all of the fundamental features of structure-delineating chemical notations appeared in the international chemical literature.
-William J. Wiswesser J. Chem. Doc. 1968, 8, 146-150
Apparently, advances in chemical line notations have a history of occurring in clusters. Perhaps the development of InChI will spawn a renaissance in the development and use of line notations. Is there room (or need) for multiple line notation languages, each filling a particular niche, or can a universal line notation ever be developed? Will currently popular line notations such as SMILES and InChI seem as cumbersome in 30 years as Wiswesser Line Notation does today?
ChemRuby First Look
Ruby is a dynamic object oriented scripting language. First released in 1995 by a Japanese programmer, it has recently begun to attract a worldwide audience.
The use of Ruby in chemical informatics, although currenly rare, can be expected to increase. One especially interesting project is ChemRuby. Although the website is written in Japanese, there is enough English to get a feel for what ChemRuby is all about.
I was unsuccessful in installing the 1.0 source due to a failed dependency on the Ruby library "dbm". I was, however, able to install version 0.9.3 via RubyGems (sudo gem install chemruby).
The code snippet below creates cyclohexane from the corresponding SMILES string, and then prints out the the number of atoms and molecular weight.
require 'rubygems'
require_gem 'chemruby'
mol = SMILES('C1CCCCC1')
puts mol.nodes.size
puts mol.molecular_weightBrowsing the API documentation shows some interesting functionality, including ring perception, cannonical SMILES, and isomorphism detection.

