Parsing SD Files with Ruby and Rubidium
Reading SD files is a bread-and-butter cheminformatics operation. At a minimum, a cheminformatics toolkit needs to parse the individual entries of an SD file, and provide access to the embedded molfile and data hash for each.
Recent articles have introduced Rubidium, a Ruby cheminformatics scripting environment. The Rubidium team now announces the release of Rubidium-0.1.1, which, among other features, introduces the ability to parse SD files.
Prerequisites
Rubidium is designed to run on JRuby. Installing JRuby is straightforward on unix-like systems. First, download the JRuby-1.1b1 binary release. Then, unpack the archive to your directory of choice. Set $JRUBY_HOME and $JAVA_HOME. Finally, add $JRUBY_HOME/bin to your path.
Installing Rubidium-0.1.1
Generally speaking, it should be possible to install Rubidium with a one-line command to RubyGems:
$ jruby -S gem install rbtk
Unfortunately at the time of this writing, I was receiving the mysterious RubyGems 404 error with the RubyForge remote repository:
$ jruby -S gem install rbtk
Select which gem to install for your platform (java)
1. rbtk 0.1.1 (java)
2. rbtk 0.1.0 (java)
3. Skip this gem
4. Cancel installation
> 1
ERROR: While executing gem ... (OpenURI::HTTPError)
404 Not Found
This appears to affect only certain RubyGems on RubyForge - possibly only those with multiple versions. It seems to be an error on the RubyForge server that occasionally appears and then disappears.
As a workaround, you can download the Rubidium gem and install it manually:
$ jruby -S gem install tmp/rbtk-0.1.1-jruby.gem
Because Rubidium-0.1.1 introduces an Active Support dependency, you will need to install that library before installing Rubidium:
$ jruby -S gem install tmp/rbtk-0.1.1-jruby.gem
ERROR: While executing gem ... (RuntimeError)
Error instaling tmp/rbtk-0.1.1-jruby.gem:
rbtk requires activesupport >= 1.4.2
$ jruby -S gem install activesupport
Successfully installed activesupport-1.4.4
Installing ri documentation for activesupport-1.4.4...
Installing RDoc documentation for activesupport-1.4.4...
$ jruby -S gem install tmp/rbtk-0.1.1-jruby.gem
Successfully installed rbtk, version 0.1.1
Installing ri documentation for rbtk-0.1.1-jruby...
Installing RDoc documentation for rbtk-0.1.1-jruby...
It's possible that the RubyForge 404 issue will be resolved by the time you read this article, so jruby -S gem install rbtk should be tried first.
Parsing an SD File
Let's say we'd like to extract all InChIs from a PubChem dataset. If you don't have one handy, a compilation of about 2000 PubChem benzodiazepines has been deposited on RubyForge.
With our unzipped datafile in our working directory, we can now test the SD File parser by saving the following library to a file called parse.rb:
require 'rubygems'
gem 'rbtk'
require 'rubidium/sdf'
def parse_sd filename
p = Rubidium::SDF::Parser.new File.new(filename)
p.each do |entry|
puts "InChI: #{entry['PUBCHEM_NIST_INCHI']}"
end
end$ jirb irb(main):001:0> require 'parse' => true irb(main):002:0> parse_sd 'pubchem_benzodiazepine_20071110.sdf' InChI: InChI=1/C16H12Cl2N2O/c1-20-14-7-6-12(18)8-13(14)16(19-9-15(20)21)10-2-4-11(17)5-3-10/h2-8H,9H2,1H3 [truncated]
RSpec and Behavior-Driven Development
If you check out the Rubidium source distribution, you'll notice that the SD parser library is tested with RSpec, the BDD framework for Ruby. Ultimately, all components of Rubidium will be tested and documented this way.
Acknowledgments
Rubidium's new SD file parser was written by Moses Hohman. It was kindly donated by Collaborative Drug Discovery, who have built their drug discovery application using Ruby on Rails.
Future Directions
One problem in working with SD files is pinpointing encoding errors. A parser should not only raise an exception, but point to a line number and identify offending text to aid debugging. Rubidium's SD parser will eventually incorporate these enhancements.
Because Rubidium runs on JRuby, performance gains may be achievable by re-writing select portions in Java.
Parsing SD files is only the beginning of the story. Many cheminformatics applications need a convenient, fast, and robust method for writing molfiles. This is also something Rubidium will attempt to provide.
If your company or organization is curious about Ruby and cheminforamatics, give Rubidium a try. Rubidium is licensed under the permissive MIT License to make collaboration as simple as possible.
The Business Case for Open Source and the Small Company 5
Few would argue against small companies using open source software - indeed many owe their very existence to it. But what real, tangible good can come from a small company releasing open source software?
Signal to Noise, the company blog of 37signals, offers a worthwhile perspective on this issue. To summarize the business case:
Certain kinds of software, like infrastructure software, take vast amounts of time and resources to get right - something that few small companies can afford. Open sourcing can accelerate the process.
Open sourcing provides a public arena in which your own company's developers can learn from other great developers.
That public arena provides unique access to a pool of smart, motivated developers - and offers a way to evaluate their work before even deciding to interview them.
Open source generates press attention and goodwill from potential customers.
And about the elephant in the room:
A big fear that a lot of people have is that they’ll somehow be giving away their secret sauce. Unless your actual product is what you’re open sourcing, it really doesn't matter (and there are even plenty of examples of that working well). It’s unlikely that the piece of code that’s only seen internal development is such a silver bullet that you’re going to outshine your competition by its use alone.
The distinction between infrastructure software and a company's secret sauce is particularly important.
By just about any standard, 37signals is a leader in the deliberate use of open source software to achieve business objectives. We can all learn from them.
How Would Your Cheminformatics Tool Do This? 3

Reference: Shi, Ma, and Gao, J. Org. Chem.
Paginated Archives in Radiant CMS: The Power of Minimal But Extendable Systems 6
If you've ever needed to build a Website hosting mostly static content, you've probably tried out a few Content Management Systems. The problem is not finding them - there must be hundreds. The problem is finding one that successfully walks the fine line between being minimal (so that you can do things your way) and powerful (so that it can grow with your needs).
Radiant CMS is one of those systems. As an added bonus, it's written in Ruby and built on Rails. Radiant succeeds by focusing on the management of pages while providing a powerful extension mechanism.
The Website for my company, Metamolecular, will consist of content produced infrequently (product descriptions and documentation) intermingled with more frequently created blog-like content (updates, tutorials, responses to user questions). Traditionally, the CMS has handled the former, with blogging software handling the latter. But we needed a system that handled both well.
One of the distinguishing characteristics of blogs, as opposed to other kinds of websites, is the unusually large number of similar pages. Handling this kind of content requires pagination - the ability to break an archive up into a series of pages containing a smaller subset of the archive.
Although Radiant doesn't have the ability to paginate its content, it does have a wonderful system for creating extensions. I thought I'd give it a try.
The result is the Paginated Archive extension. It works as a drop-in replacement for Radiant's existing Archive Page. After placing the extension into your PROJECT_HOME/vendor/extensions directory, you'll be able to create and configure Paginated Archives for use with blogs and other kinds of sites generating large numbers of pages. The extension requires Bruce Williams' excellent Paginator gem.
You can get started by downloading the extension here.

