Simple Installation of Rubidium

Posted by Rich Apodaca Wed, 21 Nov 2007 14:26:00 GMT

Rubidium is a Ruby cheminformatics scripting environment. Previously, a problem was reported with the RubyForge gem repository that prevented the simple installation of the Rubidium gem. After filing a bug report, the problem was resolved.

The problem, which led to a 404 being issued when trying to install the gem from the remote RubyGems repository, was a variant of a known RubyForge issue.

You can now install Rubidium like this:

$ jruby -S gem install rbtk

Installation takes a few minutes due to the large size of the included Chemistry Development Kit jarfile.

Parsing SD Files with Ruby and Rubidium

Posted by Rich Apodaca Mon, 12 Nov 2007 16:27:00 GMT

Reading SD files is a bread-and-butter cheminformatics operation. At a minimum, a cheminformatics toolkit needs to parse the individual entries of an SD file, and provide access to the embedded molfile and data hash for each.

Recent articles have introduced Rubidium, a Ruby cheminformatics scripting environment. The Rubidium team now announces the release of Rubidium-0.1.1, which, among other features, introduces the ability to parse SD files.

Prerequisites

Rubidium is designed to run on JRuby. Installing JRuby is straightforward on unix-like systems. First, download the JRuby-1.1b1 binary release. Then, unpack the archive to your directory of choice. Set $JRUBY_HOME and $JAVA_HOME. Finally, add $JRUBY_HOME/bin to your path.

Installing Rubidium-0.1.1

Generally speaking, it should be possible to install Rubidium with a one-line command to RubyGems:

$ jruby -S gem install rbtk

Unfortunately at the time of this writing, I was receiving the mysterious RubyGems 404 error with the RubyForge remote repository:

$ jruby -S gem install rbtk
Select which gem to install for your platform (java)
 1. rbtk 0.1.1 (java)
 2. rbtk 0.1.0 (java)
 3. Skip this gem
 4. Cancel installation
> 1
ERROR:  While executing gem ... (OpenURI::HTTPError)
    404 Not Found

This appears to affect only certain RubyGems on RubyForge - possibly only those with multiple versions. It seems to be an error on the RubyForge server that occasionally appears and then disappears.

As a workaround, you can download the Rubidium gem and install it manually:

$ jruby -S gem install tmp/rbtk-0.1.1-jruby.gem

Because Rubidium-0.1.1 introduces an Active Support dependency, you will need to install that library before installing Rubidium:

$ jruby -S gem install tmp/rbtk-0.1.1-jruby.gem
ERROR:  While executing gem ... (RuntimeError)
    Error instaling tmp/rbtk-0.1.1-jruby.gem:
        rbtk requires activesupport >= 1.4.2
$ jruby -S gem install activesupport
Successfully installed activesupport-1.4.4
Installing ri documentation for activesupport-1.4.4...
Installing RDoc documentation for activesupport-1.4.4...
$ jruby -S gem install tmp/rbtk-0.1.1-jruby.gem
Successfully installed rbtk, version 0.1.1
Installing ri documentation for rbtk-0.1.1-jruby...
Installing RDoc documentation for rbtk-0.1.1-jruby...

It's possible that the RubyForge 404 issue will be resolved by the time you read this article, so jruby -S gem install rbtk should be tried first.

Parsing an SD File

Let's say we'd like to extract all InChIs from a PubChem dataset. If you don't have one handy, a compilation of about 2000 PubChem benzodiazepines has been deposited on RubyForge.

With our unzipped datafile in our working directory, we can now test the SD File parser by saving the following library to a file called parse.rb:

require 'rubygems'
gem 'rbtk'
require 'rubidium/sdf'

def parse_sd filename
  p = Rubidium::SDF::Parser.new File.new(filename)

  p.each do |entry|
    puts "InChI: #{entry['PUBCHEM_NIST_INCHI']}"
  end
end
which can be tested with jirb:
$ jirb
irb(main):001:0> require 'parse'
=> true
irb(main):002:0> parse_sd 'pubchem_benzodiazepine_20071110.sdf'
InChI: InChI=1/C16H12Cl2N2O/c1-20-14-7-6-12(18)8-13(14)16(19-9-15(20)21)10-2-4-11(17)5-3-10/h2-8H,9H2,1H3

[truncated]

RSpec and Behavior-Driven Development

If you check out the Rubidium source distribution, you'll notice that the SD parser library is tested with RSpec, the BDD framework for Ruby. Ultimately, all components of Rubidium will be tested and documented this way.

Acknowledgments

Rubidium's new SD file parser was written by Moses Hohman. It was kindly donated by Collaborative Drug Discovery, who have built their drug discovery application using Ruby on Rails.

Future Directions

One problem in working with SD files is pinpointing encoding errors. A parser should not only raise an exception, but point to a line number and identify offending text to aid debugging. Rubidium's SD parser will eventually incorporate these enhancements.

Because Rubidium runs on JRuby, performance gains may be achievable by re-writing select portions in Java.

Parsing SD files is only the beginning of the story. Many cheminformatics applications need a convenient, fast, and robust method for writing molfiles. This is also something Rubidium will attempt to provide.

If your company or organization is curious about Ruby and cheminforamatics, give Rubidium a try. Rubidium is licensed under the permissive MIT License to make collaboration as simple as possible.

Cheminformatics for Ruby: Getting Started with Rubidium

Posted by Rich Apodaca Tue, 06 Nov 2007 16:17:00 GMT

Cheminformatics has seen the introduction of a diverse array of new open source software over the last few years. Using it all to its fullest potential is not always easy; differing languages, dependencies, interfaces, and varying levels of documentation make the job especially difficult. Rubidium is a new open source project aimed at changing that.

Rubidium is a full-featured cheminformatics scripting environment for Ruby. When complete, Rubidium will offer a single well-tested and well-documented Ruby interface to the best open source cheminformatics software. Rubidium-0.1.0 is now available for download.

Downloading and Installing Rubidium

Rubidium runs on JRuby, a pure Java implementation of the Ruby language. After installing JRuby on your system, you should be ready to install Rubidium.

Installation is most conveniently done with the Ruby package manager RubyGems.

The Rubidium RubyGem can be downloaded from RubyForge (large file). The gem command is all we need:

$ ll rbtk-0.1.0-jruby.gem
-rw-r--r-- 1 rich rich 12955136 Nov  6 07:56 rbtk-0.1.0-jruby.gem
$ jruby -S gem install rbtk-0.1.0.gem

Note: at the time of this writing, my installation of JRuby 1.0.1 was reporting an out of memory error when attempting to use the RubyForge RubyGems repository directly. Downloading Gems separately and then installing the local copy is a workaround.

Testing the Installation

Rubidium can be tested with the following code run in interactive JRuby (jirb):

$ jirb
irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> gem 'rbtk'
=> true
irb(main):003:0> require 'rubidium/lang'
=> true
irb(main):004:0> c=Rubidium::Converter.new
=> #<Rubidium::Converter:0xbd4e3c ... >
irb(main):005:0> c.set_formats 'smi', 'mol'
=> true
irb(main):006:0> c.convert 'c1ccccc1'
=> "\n  CDK    11/6/07,8:41\n\n  6  6  0  0  0  0  0  0  0  0999 V2000\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n  2  1  2  0  0  0  0 \n  3  2  1  0  0  0  0 \n  4  3  2  0  0  0  0 \n  5  4  1  0  0  0  0 \n  6  5  2  0  0  0  0 \n  6  1  1  0  0  0  0 \nM  END\n"

Low-Level Interface

There's not much yet to Rubidium itself beyond molecular language interconversions offered by the Chemistry Development Kit (CDK). But the CDK offers a wide range of cheminformatics functionality that is immediately accessible in raw form via JRuby itself. For example, we can calculate the TPSA of oxazepam:

$ jirb
irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> gem 'rbtk'
=> true
irb(main):003:0> require 'cdk/lang'
=> true
irb(main):004:0> import 'org.openscience.cdk.qsar.descriptors.molecular.TPSADescriptor'
=> ["org.openscience.cdk.qsar.descriptors.molecular.TPSADescriptor"]
irb(main):005:0> reader=CDK::SmilesReader.new
=> #<CDK::SmilesReader:0x1088a1b ... > 
irb(main):006:0> mol=reader.read 'O=C3Nc1ccc(Cl)cc1C(c2ccccc2)=NC3O'
=> #<Java::OrgOpenscienceCdk::Molecule:0x174f02c ... >
irb(main):007:0> tpsa = TPSADescriptor.new
=> #<Java::OrgOpenscienceCdkQsarDescriptorsMolecular::TPSADescriptor:0x14596d5 ...>
irb(main):008:0> result = tpsa.calculate mol
=> #<Java::OrgOpenscienceCdkQsar::DescriptorValue:0x171120a ..>
irb(main):009:0> result.value.double_value
=> 61.69

Conclusions

There's much more to be done with Rubidium. As more software packages and their Ruby interfaces are added, a major challenge will be to maintain a simple yet powerful interface to the underlying capabilities.

Building Rubidium: Creating a RubyForge Project Space

Posted by Rich Apodaca Fri, 26 Oct 2007 14:21:00 GMT

Recent articles have discussed Rubidium, the cheminformatics toolkit for Ruby. In this article, the first in a series, I'll go beyond the Ruby code to discuss the technical aspects of taking an Open Source idea from concept to release.

Finding a Home

Before setting up your Open Source project, you'll need to decide on how to host it. Project hosting can be as simple or elaborate as you wish, but the basic services include: a website; a mailing list; a discussion forum; a source code repository (typically CVS or Subversion); a bug tracking system; and a file release system.

The multitude of choices can be broken down into two basic options: host the project yourself or use a free hosting service. Fortunately, Ruby-based projects enjoy two excellent free hosting options: SourceForge and RubyForge. Although SourceForge could certainly be used for a Ruby project, RubyForge is a more popular option. One of the reasons is that any RubyGem your project releases automatically becomes installable through the RubyGems package management system with a simple one-line incantation:

$ sudo gem install <yourprojectname>

Another reason to use RubyForge is discoverability. RubyForge only hosts projects related in some way to Ruby. So, your project will stand out a lot more in its category than with a much larger site like SourceForge.

Given RubyForge's advantages, and my own interest in minimizing the work needed to maintain an Open Source project, Rubidium will be hosted on RubyForge.

Requesting a Project Space

Having decided on RubyForge as Rubidium's host, all that's left is to ask for free services. You'll need to register for a user account if you haven't done so already. Then, simply apply for project space. After about three business days, you should be notified whether your project was accepted.

Several days ago, I completed this process for Rubidium. Its new home on RubyForge will be:

http://rubyforge.org/projects/rbtk

The Rubidium home page can be found at:

http://rbtk.rubyforge.org

There's nothing useful there yet, a situation that will hopefully be fixed in a few weeks.

Next Steps

With powerful free services now available for the Rubidium project, we'll want to start taking advantage of them. The next articles in this series will discuss some ways of doing so.

Easily Convert IUPAC Nomenclature to SMILES, InChI, or Molfile with Rubidium

Posted by Rich Apodaca Fri, 19 Oct 2007 14:05:00 GMT

A recent article introduced Rubidium, a cheminformatics toolkit written in Ruby. One of Ruby's strengths is the speed with which it enables disparate pieces of code to be glued together - even if they're written in different programming languages. In this article, we'll see how Rubidium can be extended to provide support for converting IUPAC nomenclature into SMILES, InChI, or Molfile formats.

About Rubidium

Rubidium is a cheminformatics toolkit written in Ruby. Rubidium is currently configured to run on JRuby, although future versions may also work with Matz' Ruby Implementation) (MRI) via Ruby Java Bridge.

Rubidium will eventually be packaged as a RubyGem and hosted on RubyForge. For now, the toolkit consists of a running library that will updated and documented on this blog.

The Library

The library extends the CDK module presented in the previous article in this series. The main change is the addition of an IUPACReader class, based on Peter Corbett's excellent OPSIN library:

class IUPACReader
  import 'java.io.StringReader'
  import 'uk.ac.cam.ch.wwmm.opsin.NameToStructure'
  import 'org.openscience.cdk.io.CMLReader'
  import 'org.openscience.cdk.ChemFile'

  def initialize
    @iupac_reader = NameToStructure.new
    @cml_reader = CMLReader.new
  end

  def read name
    cml = @iupac_reader.parse_to_cml(name)

    raise "Could not parse '#{name}'." unless cml

    @cml_reader.set_reader StringReader.new(cml.to_xml)

    chem_file = @cml_reader.read ChemFile.new

    chem_file.chem_sequence(0).chem_model(0).molecule_set.molecule(0)
  end
end

Using this additional functionality requires nothing more than copying the OPSIN jarfile into the lib directory of your JRuby installation. You'll also need to place the CDK jarfile in this directory if you haven't done so already.

The complete Rubidium library can be downloaded here.

A Test

We can test Rubidium's IUPAC nomenclature parsing abilities with jirb. For example, to convert from name to SMILES:

$ jirb
irb(main):001:0> require 'cdk'
=> true
irb(main):002:0> c=CDK::Conversion.new
=> #<CDK::Conversion:0x46ca65 ... >
irb(main):003:0> c.set_formats 'iupac', 'smi'
=> "smi"
irb(main):004:0> c.convert '1,4-dichlorobenzene'
=> "C=1C=C(C=CC=1Cl)Cl"

To convert from name to InChI (in the same jirb session):

irb(main):005:0> c.set_out_format 'inchi'
=> "inchi"
irb(main):006:0> c.convert '1,4-dichlorobenzene'
=> "InChI=1/C6H4Cl2/c7-5-1-2-6(8)4-3-5/h1-4H"

And to convert from name to Molfile (also in the same jirb session):

irb(main):007:0> c.set_out_format 'mol'
=> "mol"
irb(main):008:0> c.convert '1,4-dichlorobenzene'
=> "\n  CDK    10/19/07,7:59\n\n  8  8  0  0  0  0  0  0  0  0999 V2000\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n    0.0000    0.0000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n  1  2  2  0  0  0  0 \n  2  3  1  0  0  0  0 \n  3  4  2  0  0  0  0 \n  4  5  1  0  0  0  0 \n  5  6  2  0  0  0  0 \n  6  1  1  0  0  0  0 \n  7  1  1  0  0  0  0 \n  8  4  1  0  0  0  0 \nM  END\n"

Conclusions

By re-using a simple conversion API together with another Java library, we've given Rubidium the ability to translate IUPAC nomenclature into other molecular languages. The additional code was both easy to write and easy to test. Future articles will discuss the packaging, distribution, and further elaboration of Rubidium.

Older posts: 1 2