From SMILES to InChI: Rino, CDK, and Ruby Java Bridge

Posted by Rich Apodaca Sat, 26 Aug 2006 19:37:00 GMT

Integrating Ruby and Java is fast and easy with Ruby Java Bridge (RJB), which was discussed previously. In this article, I'll show how RJB can be used to solve a practical chemical informatics problem - the conversion of SMILES strings into InChI identifiers.

Prerequisites

This tutorial is aimed at Linux users, but you should be able to accomplish the same thing in Windows and Mac OS X, although these systems have not been tested. You'll need to install a few software packages if you haven't done so already: Ruby; Ruby Gems; RJB; CDK; and Rino. After installing RubyGems, RJB and Rino can both be installed from the command line (as root):

# gem install rjb
# gem install rino

Next, create a working directory, smi2inchi. Into this directory, move a copy of the full CDK-2006714 jarfile. That's it for libraries, so let's move onto the translator itself.

The Translator

The Translator class consists of a small piece of Ruby code gluing CDK's SmilesParser and MDLWriter with the Ruby InChI library Rino. Rino is a thin Ruby wrapper around the IUPAC InChI library, which is in turn written in C.

ENV['CLASSPATH'] = './cdk-20060714.jar'

require 'rubygems'
require_gem 'rjb'
require_gem 'rino'
require 'rjb'

StringWriter = Rjb::import 'java.io.StringWriter'

SmilesParser = Rjb::import 'org.openscience.cdk.smiles.SmilesParser'
MDLWriter = Rjb::import 'org.openscience.cdk.io.MDLWriter'

# Converts a SMILES string into an InChI identifier using
# the CDK Library (Java) and the Rino Library (Ruby/C).
class Translator

  def initialize
    @smiles_parser = SmilesParser.new
    @mdl_writer = MDLWriter.new
    @mol2inchi = Rino::MolfileReader.new
  end

  # Returns an InChI identifier from the specified SMILES string.
  # Uses the CDK classes SmilesParser and MDLWriter to generate
  # a molfile from a SMILES string. Then this molfile is
  # parsed by Rino::MolfileReader.
  def translate(smiles)
    mol = @smiles_parser.parseSmiles(smiles)

    sw = StringWriter.new

    @mdl_writer.setWriter(sw)
    @mdl_writer.write(mol)

    @mol2inchi.read(sw.toString)
  end
end
Add the above code to a file called smi2inchi.rb. The first line points the CLASSPATH environment variable, which is needed by RJB, to the CDK library. Lines 3-6 include the RJB and Rino RubyGems. Lines 8-11 import the built-in Java class StringWriter and the CDK Java classes SmilesParser and MDLWriter using RJB's syntax. The core of the class consists of the translate method, which simply coordinates the pieces. Using the Translator class consists of creating an instance, and invoking its translate method on a SMILES string:
require 'smi2inchi'

translator = Translator.new
inchi = translator.translate 'c1ccccc1'

p inchi # => "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"
The above code fragment can be saved to a text file (e.g. test.rb) and invoked with the Ruby interpreter:
$ ruby test.rb

Alternatively, it can be entered interactively with the Interactive Ruby Interpreter (irb):

$ irb
irb(main):001:0>

With just a few lines of Ruby, we've solved a real problem. This example integrates software from three different programming languages: Ruby, C, and Java. Given the variety of chemical informatics software written in these languages, Ruby Java Bridge offers numerous integration possibilities.

107 Years of Line-Formula Notations (1861-1968)

Posted by Rich Apodaca Fri, 18 Aug 2006 06:50:00 GMT

L E5 B666 FVTJ A1 E1 OQ

Thus, within the short period of just seven years after the birth of structural chemistry in 1861, virtually all of the main ideas relating to line-formula descriptions were conceived and published. No basically new practices appeared for some 79 years. Then, within an identically brief period of just seven years (1947-1954), virtually all of the fundamental features of structure-delineating chemical notations appeared in the international chemical literature.

-William J. Wiswesser J. Chem. Doc. 1968, 8, 146-150

Apparently, advances in chemical line notations have a history of occurring in clusters. Perhaps the development of InChI will spawn a renaissance in the development and use of line notations. Is there room (or need) for multiple line notation languages, each filling a particular niche, or can a universal line notation ever be developed? Will currently popular line notations such as SMILES and InChI seem as cumbersome in 30 years as Wiswesser Line Notation does today?

ChemRuby First Look

Posted by Rich Apodaca Sun, 13 Aug 2006 22:19:00 GMT

Ruby is a dynamic object oriented scripting language. First released in 1995 by a Japanese programmer, it has recently begun to attract a worldwide audience.

The use of Ruby in chemical informatics, although currenly rare, can be expected to increase. One especially interesting project is ChemRuby. Although the website is written in Japanese, there is enough English to get a feel for what ChemRuby is all about.

I was unsuccessful in installing the 1.0 source due to a failed dependency on the Ruby library "dbm". I was, however, able to install version 0.9.3 via RubyGems (sudo gem install chemruby).

The code snippet below creates cyclohexane from the corresponding SMILES string, and then prints out the the number of atoms and molecular weight.

require 'rubygems'
require_gem 'chemruby'

mol = SMILES('C1CCCCC1')

puts mol.nodes.size
puts mol.molecular_weight

Browsing the API documentation shows some interesting functionality, including ring perception, cannonical SMILES, and isomorphism detection.

Older posts: 1 ... 4 5 6