From IUPAC Name to Molecular Formula with Ruby CDK
Recently, a question was raised on the Yahoo cheminf group list regarding the conversion of IUPAC names into molecular formulas. This can be done quickly with Ruby CDK, as this article will show.
Prerequisites
This tutorial requires Ruby CDK, which in turn requires Ruby Java Bridge (RJB). A recent Depth-First article described the minimal system configuration required to run RJB on Linux. Another article showed how to install RJB on Windows.
A Small Library
The following library will convert IUPAC nomenclature into molecular formulas with Ruby:
require 'rubygems'
require_gem 'rcdk'
require 'rcdk'
require 'rcdk/util'
module Formulator
@@hydrogen_adder = Rjb::import('org.openscience.cdk.tools.HydrogenAdder').new
def get_formula(iupac_name)
mol = RCDK::Util::Lang.read_iupac iupac_name
@@hydrogen_adder.addExplicitHydrogensToSatisfyValency mol
analyzer = Rjb::import('org.openscience.cdk.tools.MFAnalyser').new(mol)
analyzer.getMolecularFormula
end
end
Save this code as a file named formulator.rb
in your working directory.
Testing the Library
The Formulator library can be tested with the following code:
require 'formulator'
include Formulator
get_formula 'benzene' # => "C6H6"
get_formula '4-(3,4-dichlorophenyl)-N-methyl-1,2,3,4-tetrahydronaphthalen-1-amine' # => "C17H17NCl2"
Limitations
You may run across classes of structures that are not recognized by Ruby CDK. This is due to limitations of the underlying OPSIN library. For example, OPSIN does not yet recognize fused heterocycle names such as 'imidazo[2,1-b][1,3]thiazole'.
Conclusions
Ruby CDK makes short work of converting IUPAC names into molecular formulas. This is just one example of the kind of conversion that's possible. For example, a recent article discussed the conversion of IUPAC names to color 2-D structures.
Due to Ruby's position as both a highly functional scripting language and as the foundation for the popular Web application framework Ruby on Rails, a variety of IUPAC nomenclature translation applications are just a few lines of code away.