CampDepict: Building a Simple SMILES Depict Web Application With JRuby, Structure CDK, and Camping

Posted by Rich Apodaca Wed, 23 Apr 2008 15:16:00 GMT

Today's tribute to the power of simplicity comes by way of John Jaeger, who has built one of the simplest cheminformatics Web applications ever written. His creation, CampDepict, interactively produces a raster image of a 2D chemical structure given a SMILES string, not unlike Daylight's Depict application.

CampDepict uses the Ruby Web microframework Camping. From the README:

Camping is a web framework which consistently stays at less than 4kb of code. You can probably view the complete source code on a single page. But, you know, it‘s so small that, if you think about it, what can it really do?

The idea here is to store a complete fledgling web application in a single file like many small CGIs. But to organize it as a Model-View-Controller application like Rails does. You can then easily move it to Rails once you‘ve got it going.

John's application is loosely-based on the Rails Depict application first described in 2006 here on Depth-First. His code makes use of CDK and Structure CDK, and it runs on JRuby.

If you've ever been curious about what Ruby has to offer cheminformatics, CampDepict could be just the application to get your feet wet.

Five Open Tools for 2D Structure Layout (aka Structure Diagram Generation) 9

Posted by Rich Apodaca Wed, 26 Mar 2008 13:11:00 GMT

Given a molecular representation without 2D coordinates, how would you display a human-readable view?

This problem can arise in many situations, one of the most common of which is the parsing of line notations such as IUPAC nomenclature, SMILES, or InChI.

And then there are the cases when you have 2D coordinates, but they're not very aesthetically pleasing. Maybe the coordinates were created by people either in a hurry or working with low quality editors, or maybe they were generated as distorted 2D projections of 3D coordinates. Whatever the reason, simply having 2D coordinates may not be the same as having good 2D coordinates.

Last year, a Depth-First article discussed the Structure Diagram Generation (SDG) problem and how it can be solved with Open Source software. Given that nearly a year has passed, it seemed appropriate to revisit the topic.

The good news is that there are at least four independent Open Source implementations of SDG algorithms, and one potential open database approach. They are, in no particular order:

  • MCDL Written in Java, the emphasis of this software appears to be facilitating the use of Modular Chemical Descriptor Language. Unfortunately, no new releases of this intriguing software package have been made in the last year.

  • Chemistry Development Kit (CDK) This useful package handles about 70-80% of a typical assortment of chemical structures well. The large amount of activity on the CDK project in general makes this a particularly good SDG system to contribute to, especially in the areas of refactoring and handling special cases. See also Christoph Steinbeck's overview of CDK's layout system.

  • BKChem A 2D structure editor written in Python. Give it an InChI and it will display the structure, courtesy of SDG. The system worked remarkably well with the molecules I tested. BKChem has also been reported to work in batch mode.

  • RDKit Written in Python and C++, this package is the newest of the bunch. Although I haven't had much luck compiling RDKit, it still looks quite promising. Any chance of switching to make as a build system?

  • PubChem PubChem? Maybe. With a database of small molecules now numbering well over ten million, there's a good chance that the molecule for which you need to assign coordinates is already in PubChem. And if it's in PubChem, 2D coordinates have already been assigned. Use an InChI as a hash key, and voila - instant SDG without much software. Given the novelty of large, publicly-available databases of small molecules such as PubChem, this approach may have a great deal of untapped potential.

SDG is one of those issues that can stay off the radar for some only to become an instant, nagging problem with no clear way out. The tools cited here offer an excellent place to begin working toward a comprehensive solution.

Simple Installation of Rubidium

Posted by Rich Apodaca Wed, 21 Nov 2007 14:26:00 GMT

Rubidium is a Ruby cheminformatics scripting environment. Previously, a problem was reported with the RubyForge gem repository that prevented the simple installation of the Rubidium gem. After filing a bug report, the problem was resolved.

The problem, which led to a 404 being issued when trying to install the gem from the remote RubyGems repository, was a variant of a known RubyForge issue.

You can now install Rubidium like this:

$ jruby -S gem install rbtk

Installation takes a few minutes due to the large size of the included Chemistry Development Kit jarfile.

An Introduction to the Rubidium Cheminforamtics Toolkit: Interconvert SMILES, InChI, and Molfile with an Open Babel-Like Interface 4

Posted by Rich Apodaca Mon, 15 Oct 2007 14:59:00 GMT

Interconverting molecular languages is a very common operation in cheminformatics, so convenient conversion tools are desirable. Recent articles have discussed JRuby as a functional cheminformatics scripting environement. In this article, we'll see how this functionality can be combined with convenience for molecular language conversions.

In addition to illustrating a technique, this article is the first in a series aimed at documenting a new cheminformatics toolkit for Ruby called "Rubidium". Rubidium will provide a unified set of Ruby APIs for working with diverse Open Source cheminformatics tools.

Rubidium will be distributed under the highly permissive MIT License.

Prerequisites

This Rubidium library requires JRuby and the Chemistry Development Kit (CDK). Copying the CDK jarfile into your JRuby lib directory is all that's needed.

The Library

The goal of this library is to provide a simple, yet flexible way to interconvert SMILES, InChI, and molfile formats. It was inspired the Open Babel library, in which an OBConversion object is configured with input and output formats prior to performing one or more conversions. In today's library, a similar Ruby interface is created for the CDK. Because of it's length, it won't be presented in its entirety. Instead, it can be downloaded here.

Testing the Library

The library can be tested by saving it as a file called cdk.rb and invoking jirb. We can then convert a SMILES for benzene into the InChI for benzene:

$ jirb
irb(main):001:0> require 'cdk'
=> true
irb(main):002:0> c=CDK::Conversion.new
=> #<CDK::Conversion:0x4c6320 ... >
irb(main):003:0> c.set_formats 'smi', 'inchi'
=> "inchi"
irb(main):004:0> c.convert 'c1ccccc1'
=> "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"

Upcoming articles will show more examples of interconversions using this library, and discuss some of its limitations.

An Aside

It might be useful for Rubidium to support multiple Conversions, each using its own cheminformatics toolkit. For example, a recent article discussed SMILES and InChI interconversion with Ruby Open Babel. With a little tweaking, the Ruby Open Babel OBConversion interface could be make identical to the Ruby interface used in today's tutorial. We could also configure JOELib and Rosetta Conversions in an analogous fashion.

Rubidium would then offer a family of molecular language converters, each of which used exactly the same API. We could then pick the best converter based on the situation at hand.

Conclusions

With just a little Ruby code, we've created a convenient Ruby interface for interconverting SMILES, InChI, and molfile formats. JRuby supports even more interconversions through the CDK as well as other Java and Java Native Interface libraries. Future articles will discuss some of the possibilities.

JRuby for Cheminformatics: Reading and Writing InChIs Via the Java Native Interface 2

Posted by Rich Apodaca Wed, 10 Oct 2007 12:21:00 GMT

The increased use of the InChI identifier is making the reading and writing of InChIs a standard cheminformatics capability. Recent articles have discussed the advantages of JRuby for cheminformatics. One disadvantage of JRuby is that code written in C can't be directly used. The presents a potential problem for libraries, such as the InChI toolkit, that are written in C. Fortunately, the solution is simple. Today's tutorial will demonstrate how InChIs can be both read and written using the C-InChI toolkit via JRuby and the excellent JNI-InChI library.

About JNI-InChI

The JNI-InChI library, written by Jim Downing and Sam Adams, wraps the C InChI toolkit in a Java Native Interface. This low-level toolkit is suitable for building more complex software, but lacks many features present in the C InChI toolkit. For example, JNI-InChI doesn't directly interconvert SMILES or molfile with InChI. For that you'd need to build a support library. If you're building a toolkit from scratch, this lightweight approach can be a significant advantage.

The JNI-InChI binary distribution jarfile includes the compiled native InChI library. In this sense it's virtually indistinguishable from any other Java library. This simplified packaging makes it exceptionally easy to use JNI-InChI from JRuby, as we'll see below.

Installation

JRuby can be installed as described previously. To install the JNI-InChI library for JRuby, simply copy the current release jarfile into the lib directory of your JRuby installation. That's all there is to it.

A Simple Library

We can now write a simple library to read InChIs via JRuby:

require 'java'

include_class 'net.sf.jniinchi.JniInchiInput'
include_class 'net.sf.jniinchi.JniInchiInputInchi'
include_class 'net.sf.jniinchi.JniInchiWrapper'

module IUPAC
  def read_inchi inchi
    input = JniInchiInputInchi.new inchi

    JniInchiWrapper.getStructureFromInchi input
  end
end

Testing the Library

By saving the above library to a file called iupac.rb, we can parse InChIs via JRuby:

$ jirb
irb(main):001:0> require 'iupac'
=> true
irb(main):002:0> include IUPAC
=> Object
irb(main):003:0> output = read_inchi 'InChI=1/C14H10/c1-3-7-13-11(5-1)9-10-12-6-2-4-8-14(12)13/h1-10H'
=> #
irb(main):004:0> output.num_atoms
=> 14
irb(main):005:0> output.num_bonds
=> 16

Writing InChIs

Because JNI-InChI is a low-level toolkit, writing InChIs is feasible, but not trivial. We must first construct a representation, and then get the InChI for it. For example, we could get the InChI for methane as follows:

$ jirb
irb(main):001:0> require 'java'
=> true
irb(main):002:0> include_class 'net.sf.jniinchi.JniInchiInput'
=> ["net.sf.jniinchi.JniInchiInput"]
irb(main):003:0> include_class 'net.sf.jniinchi.JniInchiAtom'
=> ["net.sf.jniinchi.JniInchiAtom"]
irb(main):004:0> include_class 'net.sf.jniinchi.JniInchiWrapper'
=> ["net.sf.jniinchi.JniInchiWrapper"]
irb(main):005:0> input = JniInchiInput.new
=> #
irb(main):006:0> a1 = input.add_atom JniInchiAtom.new(0,0,0, "C")
=> #
irb(main):007:0> a1.set_implicit_h(4)
=> nil
irb(main):008:0> output = JniInchiWrapper.get_inchi input
=> #
irb(main):009:0> output.get_inchi
=> "InChI=1/CH4/h1H4"

Fortunately, we don't have to work that hard. The Chemistry Development Kit, through JNI-InChI, supports reading and writing of InChIs via a variety of molecular languages, including SMILES and molfile. More on that later, though.

Conclusions

Provided that a Java Native Interface exists for a C library, it can be used from JRuby. Future articles will discuss the use of other cheminformatics libraries written in either C or C++ from JRuby, and their integration with pure Java and Ruby libraries.

Older posts: 1 2 3 4