Ruby CDK for Newbies

Posted by Rich Apodaca Thu, 04 Oct 2007 14:01:00 GMT

Scripting languages and cheminformatics can be a highly-effective combination. With their relaxed syntax, compilation-free execution, and interactive testing environments, scripting languages offer fast development iteration cycles. And scripting languages' support for manipulating libraries written in other languages can be key in today's heterogeneous cheminformatics software environment.

Although there are many cheminformatics scripting environments to choose from, Ruby offers some important advantages. Number one on the list is the wildly-popular Ruby on Rails Web development framework. Others worth mentioning include interactive ruby (irb), the RubyGems package manager, the Rake build system, the JRuby Ruby implementation, RubyForge, and a host of other productivity-boosters.

A major focus of Depth-First over the last few months has been Ruby CDK. This library consists of a thin Ruby wrapper around the open source Chemistry Development Kit (CDK), Structure-CDK, an open source 2D rendering toolkit, and OPSIN, an open source chemical nomenclature parser. A recent comment on Depth-First by Egon Willighagen, one of CDK's creators, got me thinking about centralizing this documentation. The following collection of links is a step in that direction.

Overview and Installation

Ruby CDK in Its Environment

Using Ruby CDK

Image Generation Credit: txt2pic.com

Ruby CDK One-Liners: Create a Molfile With 2D Atom Coordinates From Arbitrary SMILES Strings

Posted by Rich Apodaca Thu, 20 Sep 2007 18:18:00 GMT

A very common operation in cheminformatics is the interconversion of molfiles and SMILES strings. Usually, converting from SMILES gives a molfile in which all atoms have coordinates of (0,0,0). Sometimes you just need more than that. The following Ruby CDK code will accept an arbitrary SMILES string and return a molfile with fully-assigned 2D atom coordinates:

require 'rubygems'
require 'rcdk'
require 'rcdk/util'
include RCDK::Util

XY.coordinate_molfile Lang.smiles_to_molfile('c1ccccc1')

Looking at it this way, those four lines of require/include statements seem pretty darn verbose.

Easily Calculate TPSA Descriptors from SMILES Strings Using Ruby CDK 3

Posted by Rich Apodaca Wed, 19 Sep 2007 13:27:00 GMT

A D-F reader wrote in to ask how to calculate Topological Polar Surface Area (TPSA) using Ruby CDK. TPSA is one of the most widely-used descriptors for predicting membrane permeability and from it other important ADME properties. This article shows how to calculate TPSA with Ruby using Ruby CDK.

The Library

Our library consists of nothing more than a few method calls to manipulate the underlying CDK library. The tpsa_for method accepts any SMILES string and returns the calculated TPSA:

require 'rubygems'
require_gem 'rcdk'
require 'rcdk/util'

jrequire 'org.openscience.cdk.qsar.descriptors.molecular.TPSADescriptor'

module TPSA
  @@calc = Org::Openscience::Cdk::Qsar::Descriptors::Molecular::TPSADescriptor.new

  def tpsa_for smiles
    mol = RCDK::Util::Lang.read_smiles smiles

    @@calc.calculate(mol).getValue.doubleValue
  end
end

An Interactive Test

Saving the library to a file called tpsa.rb lets us test it through interactive Ruby (irb):

$ irb
irb(main):001:0> require 'tpsa'
./tpsa.rb:2:Warning: require_gem is obsolete.  Use gem instead.
/usr/local/lib/ruby/gems/1.8/gems/rcdk-0.3.0/lib/rcdk/java.rb:26:Warning: require_gem is obsolete.  Use gem instead.
=> true
irb(main):002:0> include TPSA
=> Object
irb(main):003:0> tpsa_for 'COCCc1ccc(OCC(O)CNC(C)C)cc1' # metoprolol
=> 50.72
irb(main):004:0> tpsa_for 'O=C3Nc1ccc(Cl)cc1C(c2ccccc2)=NC3O' # oxazepam
=> 61.69

The results we obtain for metoprolol and oxazepam are 50.72 and 61.69, respectively. These values compare well with those reported by Ertl et al. in the definitive paper on TPSA (50.7 and 61.7, respectively).

Conclusions

It doesn't take much Ruby to command a wide range of cheminformatics functionality - in this case TPSA calculations. But the fun doesn't stop there. The CDK, and by extension Ruby CDK, offer access to a wide array of descriptor calculations, each of which follow the same basic pattern outlined here. All of it can be prototyped, debugged, and deployed through one of the most flexible programming languages currently available.

From InChI to Image with Ruby Open Babel and Ruby CDK 2

Posted by Rich Apodaca Thu, 06 Sep 2007 12:25:00 GMT

Like SMILES, InChI is a line notation that can be used to encode and store chemical information relatively efficiently. Although there are a number of scenarios where this strategy is used, what many of them have in common is the need to eventually convert an InChI into a human-readable form. In most cases, this form will be a 2D chemical structure. This article will show how a small Ruby library can convert InChI strings into color PNG images with the help of Ruby Open Babel and Ruby CDK.

The Library

Our library accepts an InChI as input and produces a scaled PNG image as output. It re-uses part of a previously-discussed library for the interconversion of SMILES and InChI.

require 'rubygems'
require 'openbabel'
require_gem 'rcdk'
require 'rcdk/util'

module InChI
  @@to_smiles = OpenBabel::OBConversion.new
  @@to_smiles.set_in_and_out_formats 'inchi', 'smi'

  def inchi_to_png inchi, path_to_png, width, height
    smiles = inchi_to_smiles inchi

    RCDK::Util::Image.smiles_to_png smiles, path_to_png, width, height
  end

  private

    def inchi_to_smiles inchi
      mol = OpenBabel::OBMol.new

      @@to_smiles.read_string(mol, inchi) or raise "Can't parse InChI: #{inchi}."
      @@to_smiles.write_string(mol).strip
    end
end

Testing

Our library can be tested by saving it to a file called inchi.rb and using interactive Ruby (the warning can safely be ignored for now):
$ irb
irb(main):001:0> require 'inchi'
./inchi.rb:3:Warning: require_gem is obsolete.  Use gem instead.
/usr/local/lib/ruby/gems/1.8/gems/rcdk-0.3.0/lib/rcdk/java.rb:26:Warning: require_gem is obsolete.  Use gem instead.
i=> true
irb(main):002:0> include InChI
=> Object
irb(main):003:0> inchi='InChI=1/C23H27FN4O2/c1-15-18(23(29)28-10-3-2-4-21(28)25-15)9-13-27-11-7-16(8-12-27)22-19-6-5-17(24)14-20(19)30-26-22/h5-6,14,16H,2-4,7-13H2,1H3' #risperidone
=> "InChI=1/C23H27FN4O2/c1-15-18(23(29)28-10-3-2-4-21(28)25-15)9-13-27-11-7-16(8-12-27)22-19-6-5-17(24)14-20(19)30-26-22/h5-6,14,16H,2-4,7-13H2,1H3"
irb(main):004:0> inchi_to_png inchi, 'risperidone.png', 300, 300
=> nil

This code produces the following image:

Our library can also be used on more complicated molecules, for example Brevetoxin:

$ irb
irb(main):001:0> require 'inchi'
./inchi.rb:3:Warning: require_gem is obsolete.  Use gem instead.
/usr/local/lib/ruby/gems/1.8/gems/rcdk-0.3.0/lib/rcdk/java.rb:26:Warning: require_gem is obsolete.  Use gem instead.
=> true
irb(main):002:0> include InChI
=> Object
irb(main):003:0> inchi='InChI=1/C49H70O13/c1-26-17-36-39(22-45(52)58-36)57-44-21-38-40(62-48(44,4)23-26)18-28(3)46-35(55-38)11-7-6-10-31-32(59-46)12-8-14-34-33(54-31)13-9-15-43-49(5,61-34)24-42-37(56-43)20-41-47(60-42)30(51)19-29(53-41)16-27(2)25-50/h6-8,14,25-26,28-44,46-47,51H,2,9-13,15-24H2,1,3-5H3/b7-6-,14-8-' #brevetoxin a
=> "InChI=1/C49H70O13/c1-26-17-36-39(22-45(52)58-36)57-44-21-38-40(62-48(44,4)23-26)18-28(3)46-35(55-38)11-7-6-10-31-32(59-46)12-8-14-34-33(54-31)13-9-15-43-49(5,61-34)24-42-37(56-43)20-41-47(60-42)30(51)19-29(53-41)16-27(2)25-50/h6-8,14,25-26,28-44,46-47,51H,2,9-13,15-24H2,1,3-5H3/b7-6-,14-8-"
irb(main):004:0> inchi_to_png inchi, 'brevetoxin.png', 300, 200
=> nil

This produces the following image:

Conclusions

While our library could certainly be improved, it solves what otherwise would be a very difficult problem conveniently. Areas for further work include error handling and improving the appearance of the images (the latter is the aim of Firefly). Despite the fact that three programming languages are used (Ruby, C++, and Java), this complexity is neatly encapsulated behind a simple Ruby interface.

From IUPAC Name to Molecular Formula with Ruby CDK

Posted by Rich Apodaca Tue, 13 Mar 2007 14:25:00 GMT

Recently, a question was raised on the Yahoo cheminf group list regarding the conversion of IUPAC names into molecular formulas. This can be done quickly with Ruby CDK, as this article will show.

Prerequisites

This tutorial requires Ruby CDK, which in turn requires Ruby Java Bridge (RJB). A recent Depth-First article described the minimal system configuration required to run RJB on Linux. Another article showed how to install RJB on Windows.

A Small Library

The following library will convert IUPAC nomenclature into molecular formulas with Ruby:

require 'rubygems'
require_gem 'rcdk'
require 'rcdk'
require 'rcdk/util'

module Formulator
  @@hydrogen_adder = Rjb::import('org.openscience.cdk.tools.HydrogenAdder').new

  def get_formula(iupac_name)
    mol = RCDK::Util::Lang.read_iupac iupac_name
    @@hydrogen_adder.addExplicitHydrogensToSatisfyValency mol
    analyzer = Rjb::import('org.openscience.cdk.tools.MFAnalyser').new(mol)

    analyzer.getMolecularFormula
  end
end

Save this code as a file named formulator.rb in your working directory.

Testing the Library

The Formulator library can be tested with the following code:

require 'formulator'
include Formulator

get_formula 'benzene' # => "C6H6"
get_formula '4-(3,4-dichlorophenyl)-N-methyl-1,2,3,4-tetrahydronaphthalen-1-amine' # => "C17H17NCl2"

Limitations

You may run across classes of structures that are not recognized by Ruby CDK. This is due to limitations of the underlying OPSIN library. For example, OPSIN does not yet recognize fused heterocycle names such as 'imidazo[2,1-b][1,3]thiazole'.

Conclusions

Ruby CDK makes short work of converting IUPAC names into molecular formulas. This is just one example of the kind of conversion that's possible. For example, a recent article discussed the conversion of IUPAC names to color 2-D structures.

Due to Ruby's position as both a highly functional scripting language and as the foundation for the popular Web application framework Ruby on Rails, a variety of IUPAC nomenclature translation applications are just a few lines of code away.

Older posts: 1 2 3