An Introduction to the Rubidium Cheminforamtics Toolkit: Interconvert SMILES, InChI, and Molfile with an Open Babel-Like Interface 4

Posted by Rich Apodaca Mon, 15 Oct 2007 10:59:00 GMT

Interconverting molecular languages is a very common operation in cheminformatics, so convenient conversion tools are desirable. Recent articles have discussed JRuby as a functional cheminformatics scripting environement. In this article, we'll see how this functionality can be combined with convenience for molecular language conversions.

In addition to illustrating a technique, this article is the first in a series aimed at documenting a new cheminformatics toolkit for Ruby called "Rubidium". Rubidium will provide a unified set of Ruby APIs for working with diverse Open Source cheminformatics tools.

Rubidium will be distributed under the highly permissive MIT License.

Prerequisites

This Rubidium library requires JRuby and the Chemistry Development Kit (CDK). Copying the CDK jarfile into your JRuby lib directory is all that's needed.

The Library

The goal of this library is to provide a simple, yet flexible way to interconvert SMILES, InChI, and molfile formats. It was inspired the Open Babel library, in which an OBConversion object is configured with input and output formats prior to performing one or more conversions. In today's library, a similar Ruby interface is created for the CDK. Because of it's length, it won't be presented in its entirety. Instead, it can be downloaded here.

Testing the Library

The library can be tested by saving it as a file called cdk.rb and invoking jirb. We can then convert a SMILES for benzene into the InChI for benzene:

$ jirb
irb(main):001:0> require 'cdk'
=> true
irb(main):002:0> c=CDK::Conversion.new
=> #<CDK::Conversion:0x4c6320 ... >
irb(main):003:0> c.set_formats 'smi', 'inchi'
=> "inchi"
irb(main):004:0> c.convert 'c1ccccc1'
=> "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"

Upcoming articles will show more examples of interconversions using this library, and discuss some of its limitations.

An Aside

It might be useful for Rubidium to support multiple Conversions, each using its own cheminformatics toolkit. For example, a recent article discussed SMILES and InChI interconversion with Ruby Open Babel. With a little tweaking, the Ruby Open Babel OBConversion interface could be make identical to the Ruby interface used in today's tutorial. We could also configure JOELib and Rosetta Conversions in an analogous fashion.

Rubidium would then offer a family of molecular language converters, each of which used exactly the same API. We could then pick the best converter based on the situation at hand.

Conclusions

With just a little Ruby code, we've created a convenient Ruby interface for interconverting SMILES, InChI, and molfile formats. JRuby supports even more interconversions through the CDK as well as other Java and Java Native Interface libraries. Future articles will discuss some of the possibilities.

JRuby for Cheminformatics: Parsing IUPAC Nomenclature with OPSIN

Posted by Rich Apodaca Fri, 12 Oct 2007 10:37:00 GMT

Recent articles have discussed the use of JRuby for cheminformatics. We've seen how to parse SMILES strings, and read or write InChIs. In this article, we'll see how easy it is to parse IUPAC nomenclature from JRuby using Peter Corbett's OPSIN library.

Installation

After installing JRuby, simply download the OPSIN jarfile and copy it to your JRuby lib directory. You're done.

A Simple Library

We can write a simple library to convert an IUPAC name into a CML document:

require 'jruby'

import 'uk.ac.cam.ch.wwmm.opsin.NameToStructure'

module IUPAC
  @@nts = NameToStructure.new

  def read_name name
    cml = @@nts.parse_to_cml(name)

    raise "Could not parse '#{name}'." unless cml

    cml.to_xml
  end
end

The read_name method accepts an iupac name as a string and returns a CML document as a string. If the input can't be parsed, an exception is raised.

Testing the Library

We can test the library by saving it as a file called iupac.rb and invoking jirb:

$ jirb
irb(main):001:0> require 'iupac'
=> true
irb(main):002:0> include IUPAC
=> Object
irb(main):003:0> read_name('4-iodobenzoic acid')

This returns the XML shown below, which has been re-formatted for clarity:

<cml xmlns="http://www.xml-cml.org/schema">
  <molecule id="m1">
    <atomArray>
      <atom id="a1" elementType="C">
        <label value="1" />
      </atom>
      <atom id="a2" elementType="C">
        <label value="2" />
      </atom>
      <atom id="a3" elementType="C">
        <label value="3" />
      </atom>
      <atom id="a4" elementType="C">
        <label value="4" />
      </atom>
      <atom id="a5" elementType="C">
        <label value="5" />
      </atom>
      <atom id="a6" elementType="C">
        <label value="6" />
      </atom>
      <atom id="a7" elementType="C" />
      <atom id="a8" elementType="O" />
      <atom id="a9" elementType="O" />
      <atom id="a10" elementType="I">
        <label value="1" />
      </atom>
    </atomArray>
    <bondArray>
      <bond atomRefs2="a1 a2" order="2" />
      <bond atomRefs2="a2 a3" order="1" />
      <bond atomRefs2="a3 a4" order="2" />
      <bond atomRefs2="a4 a5" order="1" />
      <bond atomRefs2="a5 a6" order="2" />
      <bond atomRefs2="a6 a1" order="1" />
      <bond atomRefs2="a7 a1" order="1" />
      <bond atomRefs2="a7 a8" order="2" />
      <bond atomRefs2="a7 a9" order="1" />
      <bond atomRefs2="a10 a4" order="1" />
    </bondArray>
  </molecule>
</cml>

This simple Ruby library has parsed the name '4-iodobenzoic acid' and has returned a string containing the CML representation for the molecule. If we had wanted the read_name method to return a traversable XML object model, we could have enabled that as well.

Conclusions

One of the objections raised whenever the issue of "new" programming languages comes up, regardless of their merit, is the age-old refrain "Yeah, but where's the software?" With JRuby, we bypass this question altogether. We can leverage the full scope of the massive Java development effort over the last ten years, which includes several excellent cheminformatics libraries. With virtually no effort, we have a working cheminformatics platform based on a widely-used, versatile and dynamic object-oriented scripting language. Future articles will discuss extensions to this platform and some applications.

JRuby for Cheminformatics: Reading and Writing InChIs Via the Java Native Interface 2

Posted by Rich Apodaca Wed, 10 Oct 2007 08:21:00 GMT

The increased use of the InChI identifier is making the reading and writing of InChIs a standard cheminformatics capability. Recent articles have discussed the advantages of JRuby for cheminformatics. One disadvantage of JRuby is that code written in C can't be directly used. The presents a potential problem for libraries, such as the InChI toolkit, that are written in C. Fortunately, the solution is simple. Today's tutorial will demonstrate how InChIs can be both read and written using the C-InChI toolkit via JRuby and the excellent JNI-InChI library.

About JNI-InChI

The JNI-InChI library, written by Jim Downing and Sam Adams, wraps the C InChI toolkit in a Java Native Interface. This low-level toolkit is suitable for building more complex software, but lacks many features present in the C InChI toolkit. For example, JNI-InChI doesn't directly interconvert SMILES or molfile with InChI. For that you'd need to build a support library. If you're building a toolkit from scratch, this lightweight approach can be a significant advantage.

The JNI-InChI binary distribution jarfile includes the compiled native InChI library. In this sense it's virtually indistinguishable from any other Java library. This simplified packaging makes it exceptionally easy to use JNI-InChI from JRuby, as we'll see below.

Installation

JRuby can be installed as described previously. To install the JNI-InChI library for JRuby, simply copy the current release jarfile into the lib directory of your JRuby installation. That's all there is to it.

A Simple Library

We can now write a simple library to read InChIs via JRuby:

require 'java'

include_class 'net.sf.jniinchi.JniInchiInput'
include_class 'net.sf.jniinchi.JniInchiInputInchi'
include_class 'net.sf.jniinchi.JniInchiWrapper'

module IUPAC
  def read_inchi inchi
    input = JniInchiInputInchi.new inchi

    JniInchiWrapper.getStructureFromInchi input
  end
end

Testing the Library

By saving the above library to a file called iupac.rb, we can parse InChIs via JRuby:

$ jirb
irb(main):001:0> require 'iupac'
=> true
irb(main):002:0> include IUPAC
=> Object
irb(main):003:0> output = read_inchi 'InChI=1/C14H10/c1-3-7-13-11(5-1)9-10-12-6-2-4-8-14(12)13/h1-10H'
=> #
irb(main):004:0> output.num_atoms
=> 14
irb(main):005:0> output.num_bonds
=> 16

Writing InChIs

Because JNI-InChI is a low-level toolkit, writing InChIs is feasible, but not trivial. We must first construct a representation, and then get the InChI for it. For example, we could get the InChI for methane as follows:

$ jirb
irb(main):001:0> require 'java'
=> true
irb(main):002:0> include_class 'net.sf.jniinchi.JniInchiInput'
=> ["net.sf.jniinchi.JniInchiInput"]
irb(main):003:0> include_class 'net.sf.jniinchi.JniInchiAtom'
=> ["net.sf.jniinchi.JniInchiAtom"]
irb(main):004:0> include_class 'net.sf.jniinchi.JniInchiWrapper'
=> ["net.sf.jniinchi.JniInchiWrapper"]
irb(main):005:0> input = JniInchiInput.new
=> #
irb(main):006:0> a1 = input.add_atom JniInchiAtom.new(0,0,0, "C")
=> #
irb(main):007:0> a1.set_implicit_h(4)
=> nil
irb(main):008:0> output = JniInchiWrapper.get_inchi input
=> #
irb(main):009:0> output.get_inchi
=> "InChI=1/CH4/h1H4"

Fortunately, we don't have to work that hard. The Chemistry Development Kit, through JNI-InChI, supports reading and writing of InChIs via a variety of molecular languages, including SMILES and molfile. More on that later, though.

Conclusions

Provided that a Java Native Interface exists for a C library, it can be used from JRuby. Future articles will discuss the use of other cheminformatics libraries written in either C or C++ from JRuby, and their integration with pure Java and Ruby libraries.

JRuby for Cheminformatics: Parsing SMILES Simply

Posted by Rich Apodaca Tue, 09 Oct 2007 08:40:00 GMT

The previous article in this series outlined some reasons to consider JRuby for cheminformatics. Now I'll show how easy it is to get started by describing how to parse SMILES strings with the help of the Chemistry Development Kit (CDK).

What About Ruby CDK?

A number of Depth-First articles have discussed Ruby CDK. This library runs on top of C-Ruby, otherwise known as Matz' Ruby Implementation (MRI). Ruby Java Bridge connects MRI to a Java Virtual Machine under Ruby CDK.

This article, and the others to follow, will instead discuss the use of the CDK and other Java libraries from JRuby. In contrast to MRI, JRuby is a pure Java implementation of the Ruby language. This approach offers some important advantages which will be highlighted along the way.

Installing JRuby

JRuby is not difficult to install. On Linux, the steps are:

  1. Install JDK Version 1.4 or higher.

  2. Download and unpack the most recent JRuby release - at the time of this writing, version 1.0.1.

  3. Add the JRuby bin directory to your path.

  4. There is no Step 4. ;-)

Installing CDK for JRuby

Installing CDK so that it works on JRuby is similarly quite simple:

  1. Download the most recent CDK jarfile - at the time of this writing, version 1.0.1.

  2. Move the CDK jarfile to your JRuby lib directory.

Testing CDK for JRuby

You can verify that your new CDK for JRuby installation works with jirb:

$ jirb
irb(main):001:0> require 'java'
=> true
irb(main):002:0> include_class 'org.openscience.cdk.smiles.SmilesParser'
=> ["org.openscience.cdk.smiles.SmilesParser"]

You should notice that jirb takes a few seconds to initialize the JVM, whereas irb starts almost instantly.

A Library to Read SMILES

We can write a short library to read SMILES strings using the CDK:

require 'java'
include_class 'org.openscience.cdk.smiles.SmilesParser'

module Daylight
  @@smiles_parser = SmilesParser.new

  def read_smiles smiles
    @@smiles_parser.parse_smiles smiles
  end
end

Notice the use of the Rubyesque method name parse_smiles rather than parseSmiles. This is just one of the built-in conveniences offered by JRuby.

Testing the Library

Saving the library as a file called daylight.rb lets us test it using interactive JRuby:
$ jirb
irb(main):001:0> require 'daylight'
=> true
irb(main):002:0> include Daylight
=> Object
irb(main):003:0> mol = read_smiles 'c1ccccc1'
=> #
irb(main):004:0> mol.atom_count
=> 6

As you can see, the benzene SMILES has been parsed correctly. Again, notice the use of the Rubyesque method name atom_count, rather than the CDK Java bean convention method name getAtomCount. This feature makes it easy to ignore the fact you're using a Java library and get on with writing your Ruby code. Brilliant!

Conclusions

This article has shown how to install JRuby and begin to write some simple cheminformatics programs with a distinctive Ruby flavor. Although the focus was on SMILES parsing, there's much more functionality to be found within the CDK and other cheminformatics libraries written in Java. Future articles will outline some of the possibilities.

Older posts: 1 2