JRuby for Cheminformatics: Parsing SMILES Simply

Posted by Rich Apodaca Tue, 09 Oct 2007 08:40:00 GMT

The previous article in this series outlined some reasons to consider JRuby for cheminformatics. Now I'll show how easy it is to get started by describing how to parse SMILES strings with the help of the Chemistry Development Kit (CDK).

What About Ruby CDK?

A number of Depth-First articles have discussed Ruby CDK. This library runs on top of C-Ruby, otherwise known as Matz' Ruby Implementation (MRI). Ruby Java Bridge connects MRI to a Java Virtual Machine under Ruby CDK.

This article, and the others to follow, will instead discuss the use of the CDK and other Java libraries from JRuby. In contrast to MRI, JRuby is a pure Java implementation of the Ruby language. This approach offers some important advantages which will be highlighted along the way.

Installing JRuby

JRuby is not difficult to install. On Linux, the steps are:

  1. Install JDK Version 1.4 or higher.

  2. Download and unpack the most recent JRuby release - at the time of this writing, version 1.0.1.

  3. Add the JRuby bin directory to your path.

  4. There is no Step 4. ;-)

Installing CDK for JRuby

Installing CDK so that it works on JRuby is similarly quite simple:

  1. Download the most recent CDK jarfile - at the time of this writing, version 1.0.1.

  2. Move the CDK jarfile to your JRuby lib directory.

Testing CDK for JRuby

You can verify that your new CDK for JRuby installation works with jirb:

$ jirb
irb(main):001:0> require 'java'
=> true
irb(main):002:0> include_class 'org.openscience.cdk.smiles.SmilesParser'
=> ["org.openscience.cdk.smiles.SmilesParser"]

You should notice that jirb takes a few seconds to initialize the JVM, whereas irb starts almost instantly.

A Library to Read SMILES

We can write a short library to read SMILES strings using the CDK:

require 'java'
include_class 'org.openscience.cdk.smiles.SmilesParser'

module Daylight
  @@smiles_parser = SmilesParser.new

  def read_smiles smiles
    @@smiles_parser.parse_smiles smiles
  end
end

Notice the use of the Rubyesque method name parse_smiles rather than parseSmiles. This is just one of the built-in conveniences offered by JRuby.

Testing the Library

Saving the library as a file called daylight.rb lets us test it using interactive JRuby:
$ jirb
irb(main):001:0> require 'daylight'
=> true
irb(main):002:0> include Daylight
=> Object
irb(main):003:0> mol = read_smiles 'c1ccccc1'
=> #
irb(main):004:0> mol.atom_count
=> 6

As you can see, the benzene SMILES has been parsed correctly. Again, notice the use of the Rubyesque method name atom_count, rather than the CDK Java bean convention method name getAtomCount. This feature makes it easy to ignore the fact you're using a Java library and get on with writing your Ruby code. Brilliant!

Conclusions

This article has shown how to install JRuby and begin to write some simple cheminformatics programs with a distinctive Ruby flavor. Although the focus was on SMILES parsing, there's much more functionality to be found within the CDK and other cheminformatics libraries written in Java. Future articles will outline some of the possibilities.

Five Reasons to Start Using JRuby Now 4

Posted by Rich Apodaca Mon, 08 Oct 2007 08:27:00 GMT

JRuby is an implementation of the Ruby programming language on the Java Virtual Machine. Until about six months ago, JRuby was merely a technical curiosity; few seemed interested in using it for 'serious' development work. This perception has changed over the last six months, as JRuby's progress has accelerated. Here, in no particular order, are five reasons to consider using JRuby for your current or next project.

  1. JRuby Now Works. With the recent release of JRuby 1.0, JRuby now does just about everything C-Ruby does. Performance is now almost at the level of the C-Ruby implementation.

  2. JRuby on Rails Now Works. For a long time, JRuby's support for Rails was limited. Not so any more. Google JRuby Rails and be amazed at all of the activity.

  3. Ruby Can Now Be Compiled to Java Bytecode. Simply amazing.

  4. Sun is Financing JRuby Development. Two core JRuby developers, Charles Nutter and Thomas Enebo, have been hired by Sun Microsystems to develop JRuby. Sun gets it. They know that Java the Platform matters far more than Java the Language. Despite their own investment in Java FX, Sun is taking a pragmatic approach to scripting on the JVM. If only Sun's industry peers were as pragmatic.

  5. The JVM Offers an Enterprise-Quality Platform for Ruby Applications. The JVM is one of the best-tested and most reliable software platforms in existence. Ruby run on top of a JVM makes a lot more sense than Ruby run on top of metal via C. Likewise, hosting Ruby on Rails applications inside a Java application container offers a way out of the current Rails deployment conundrum. It won't be long before everyone sees it that way. For now, those who get it will have a head start.

Far from being a passing curiosity, JRuby may well become the preferred way to develop both Java and Ruby applications. The wide array of cheminformatics code already written in Java makes JRuby an especially attractive platform for chemistry software. Future articles will show how the powerful duo of Ruby and the Java Virtual Machine can be used to speed the development and deployment of cheminformatics Web applications.

Casual Saturdays: Truthiness

Posted by Rich Apodaca Sat, 06 Oct 2007 11:54:00 GMT

Greg Williams has done some great graphics using Wikipedia articles as source material.

What Makes Wikipedia Tick? 1

Posted by Rich Apodaca Fri, 05 Oct 2007 10:28:00 GMT

Whatever your views on Wikipedia, it's clear that the volunteer online encyclopedia has left it's mark on society. But the most important things about Wikipedia have less to do with its contents and more to do with the people contributing and using the service. To understand how and why people collaborate on the Web, you have to understand Wikipedia.

An interview with three leading Wikipedia figures sheds some light on Wikipedia as a collaborative activity.

There is a myth about online collaboration that Open Source practitioners are very familiar with. It goes something like this: "I'll start building something and release it to the community. I'll get feedback from a lot of users, some of whom will fix bugs, write documentation, and build extensions. All of that feedback will create a better product."

Now, this does happen, of course. The reason I consider it a myth is that it happens so rarely that you might as well not count on it. Virtually all Open Source software is designed, written, documented, debugged, and promoted by a single developer with the help of a tiny fraction (say 2-10%) of the committed user base. Pick any good example of Open Source software that works and behind it you'll find a committed user base large enough to make 2-10% a number greater or equal to one. It's not clear this is necessarily a bad thing.

The interview with the Wikipedia leaders confirmed this view. When asked about the idea that lots of contributors makes a good article, Elisabeth Bauer, of the English Wikipedia, had this to say:

The best articles are typically written by a single or a few authors with expertise in the topic. In this respect, Wikipedia is not different from classical encyclopedias.

Her view was shared by Kizo Naoko, of the Japanese Wikipedia who added that short articles tend to remain short and of poor quality.

There doesn't seem to be anything complicated here. Wikipedia places a very low barrier to contribution. It has created a system where active contributors with specialized knowledge feel a sense of ownership over their contributions. Checks and balances insure that these contributors can monitor changes to their work, and correct errors. Finally, the subject matter is so broadly appealing (All of Human Knowledge) that 2-10% of the user base is a massive number.

It may not be complicated, but it's far from easy.

Ruby CDK for Newbies

Posted by Rich Apodaca Thu, 04 Oct 2007 10:01:00 GMT

Scripting languages and cheminformatics can be a highly-effective combination. With their relaxed syntax, compilation-free execution, and interactive testing environments, scripting languages offer fast development iteration cycles. And scripting languages' support for manipulating libraries written in other languages can be key in today's heterogeneous cheminformatics software environment.

Although there are many cheminformatics scripting environments to choose from, Ruby offers some important advantages. Number one on the list is the wildly-popular Ruby on Rails Web development framework. Others worth mentioning include interactive ruby (irb), the RubyGems package manager, the Rake build system, the JRuby Ruby implementation, RubyForge, and a host of other productivity-boosters.

A major focus of Depth-First over the last few months has been Ruby CDK. This library consists of a thin Ruby wrapper around the open source Chemistry Development Kit (CDK), Structure-CDK, an open source 2D rendering toolkit, and OPSIN, an open source chemical nomenclature parser. A recent comment on Depth-First by Egon Willighagen, one of CDK's creators, got me thinking about centralizing this documentation. The following collection of links is a step in that direction.

Overview and Installation

Ruby CDK in Its Environment

Using Ruby CDK

Image Generation Credit: txt2pic.com

Older posts: 1 2 3 4 5