Anatomy of a Cheminformatics Web Application: Ajaxifying Depict

Posted by Rich Apodaca Mon, 04 Dec 2006 20:06:00 GMT

The previous tutorial in this series showed some techniques for improving the appearance and usability of a simple cheminformatics Web application. That application, Depict, rendered color images of 2-D molecular structures when given a SMILES string. Still, something is missing. Wouldn't it be better if the application responded to individual keystrokes in the input field, rather than waiting for the user to hit the return key? In this tutorial, we'll see how to quickly accomplish this effect with a technology called "Ajax."

Downloads and Prerequisites

For this tutorial, you'll need Ruby CDK (RCDK). A recent article described the small amount of system configuration required for RCDK on Linux. Another article showed how to install RCDK on Windows.

In addition, you'll need to install Ruby on Rails - something that can be done through RubyGems.

The Rails application that this tutorial starts with can be downloaded from this link. If you'd rather start working directly with the version of Depict produced by applying the changes outlined in this tutorial, the full source code can be downloaded from this link.

If you'll be running Depict on an AMD64 Linux system, you'll need to prepend your invocation of script/server with LD_PRELOAD. For example, on my system running Sun's JVM, the full command looks like:

$ LD_PRELOAD=/usr/java/jdk1.5.0_09/jre/lib/amd64/libzip.so ruby script/server

A Brief Introduction to Ajax

When stripped down to its essentials, Ajax is nothing more than an asynchronous communication channel between Web browsers and Web servers. In the pre-Ajax model of client-server Web interactions, a browser would make a request to a server and then wait until getting a server response, which would take the form of a complete Web page. In the Ajax model, a browser makes a request to a server, continuing to function while the server generates a response, which takes the form of a small section of the page that gets replaced. For this reason, Ajax-enabled Web sites are far more application-like than the document-centric sites that preceded them.

Ajax Support in Rails

Ajax is implemented in JavaScript using the HTMLHttpRequest object, although working at this level can require a lot of code to do anything meaningful. Fortunately, Rails and other Web application frameworks provide high-level interfaces to Ajax. In Rails, Ajax support takes the form of a variety of helper methods, one of which we'll use in this tutorial: observe_field. This method, an instance of the Observer Pattern, assigns an Observer to monitor input activity in a text field.

The Problem at Hand

We'd like Depict to provide immediate feedback by rendering a SMILES string as it is keyed into the input field. If the partial SMILES string is valid, it will be rendered, otherwise, an error image will be rendered. At no point will the user need to press the return key to see an image of the SMILES string they are typing.

Step 1: Ajaxify the View

Let's start by adding an observer to Depict's input field. These changes will occur to the SMILES View, contained in depict/app/views/smiles/depict.rhtml:

<html>
  <head>
    <title>Depict</title>
    <%= stylesheet_link_tag "default", :media => "all" %>

    <!-- Nothing works without this line. -->
    <%= javascript_include_tag :defaults %>
  </head>
  <body>
    <h1>Depict a SMILES String</h1>

    <!-- New id attribute needed by Ajax -->
    <div class="image" id="results" >
      <img src="<%= image_for_smiles :smiles => @smiles %>"></img>
    </div>
    <br /><br />
      <div class="smiles">
      <%= form_tag :action=>'depict' %>
        <label>SMILES: </label>

        <!-- Ajaxified text field. -->
        <!-- We turn off autocomplete to simplfify the interface. -->
        <%= text_field_tag :smiles, @smiles, {:autocomplete => "off"} %>
        <%= observe_field( :smiles,
                           :frequency => 0.5,
                           :update    => :results,
                           :url       => { :action => :ajax_depict } ) %>
      <%= end_form_tag %>
      </div>
  </body>

  <div class="about">
    <!-- Update the URL to point to the new Depth-First article -->
    <a href="http://depth-first.com/articles/2006/12/04/anatomy-of-a-cheminformatics-web-application-ajaxifying-depict">About this Application</a>
  </div>
</html>

The above code introduces three key elements:

  • The javascript_include_tag method is called, which is surprisingly easy to forget to do.

  • The original text_field method call is replaced by text_field_tag to simplify coding. We disable browser-based autocompletion by setting the autocomplete attribute to off. This removes a feature unlikely to ever be used, and leads to a more streamlined interface.

  • The observe_field method is called, linking activity in the text field to an Ajax action, ajax_depict, that will update the image area. To accomplish this, we assign the div containing our image the id "results."

Making these changes and refreshing the browser window gives a screen like the one below:

Although the client side of the Ajax communication channel is working, the server side is not. Let's fix that.

Step 2: Ajaxify the Server

Depict needs an Action and View that will be invoked in response to keyboard events in the SMILES input box. To do this, first add a new ajax_depict method to SmilesController, the source for which is found in depict/app/controllers/smiles_controller.rb:

class SmilesController < ApplicationController
  def depict
    if params[:smiles]
      @smiles = params[:smiles][:value]
    else
      @smiles = ''
    end
  end

  def image_for
    if flash[:bytes]
      send_data(flash[:bytes], :type => "image/png", :disposition => "inline", :filename => "#{flash[:smiles]}.png")
    end
  end

  # The new ajax_depict method.
  def ajax_depict
    @smiles=request.raw_post
  end
end

Making the above changes and refreshing your browser should give an error message:

The new ajax_depict method is being called, but no associated template exits. This template contains the HTML that will be inserted into the div with the results id attribute that we set up in Step 1. We can resolve the error we're getting by simply creating a new file (depict/app/views/smiles/ajax_depict.rhtml) containing the following partial template:

<img src="<%= image_for_smiles :smiles => @smiles %>"></img>

Now, refreshing your browser should produce a screen like that shown below. We have now Ajaxified Depict, but we're not quite done yet.

Step 3: Update the Cascading Style Sheet

As you type a SMILES string into the input window, you may have noticed the input box being repositioned toward the top of the application window just prior to the display of a new image. This is due to the image area being resized to zero height as the new image is generated.

Fortunately, the fix is simple; we'll just specify that the image area must be 400 pixels high, whether an image is being displayed or not. This is done by editing the image selector in the CSS file at depict/public/stylesheets/default.css:

.image {
    margin-left: auto;
    margin-right: auto;
    width: 400px;
    /* Keeps the input box from moving during image refresh.*/
    height: 400px;
}

Refreshing the Depict window should now give a statically-positioned SMILES input field.

Step 4: Backward Compatibility

As it stands, if the user presses the return key, they will see the "Enter SMILES Below" message. This is due to the change in the way SMILES strings are transmitted into the application. To fix this problem, we simply change the way that SmilesController assigns the smiles instance variable (depict/app/controllers/smiles_controller.rb):

def depict
  # Uses new input method.
  if params[:smiles]
    @smiles = params[:smiles]
  else
    @smiles = ''
  end
end

Making this change produces an interface that will render the correct image whether the return key is typed or not. If JavaScript is disabled, Depict will work exactly the same way as it did in the non-Ajax version.

Conclusions

Ajax makes the Web more attractive than ever as an application development platform. In this tutorial, we've seen how using Rails made it very easy to give Depict the feel of an interactive SMILES depiction tool using Ajax. But a few details remain before we're ready to deploy this application on a Web server for the public to use. For example, we need to take server load and network latency into account, and we need to make sure Depict works well on all major browsers. The next articles in this series will address these issues.

Molbank and the Convergence of Open Access, Open Data, and Open Source in Chemistry

Posted by Rich Apodaca Thu, 30 Nov 2006 20:01:00 GMT

Molbank, published by Molecuar Diversity Preservation International, is one of the oldest of a handful of Open Access journals in chemistry. Although its longevity is a remarkable accomplishment in itself, there is much more to Molbank than meets eye. Just below the surface is a feature so revolutionary, yet simple, that chemistry publishers years from now will wonder why they didn't implement it sooner.

A Molbank article consists of a short monograph on a single compound, or possibly two. This may strike some scientists as a strange way to publish results, and it is unusual. On the other hand, this system offers vast potential to capture useful, but "unpublishable" findings that would otherwise be lost. Back when scientists actually read hardcopy journals, such a system would never have been feasible. Today, with hard drive space measured in terabytes, fiber optics cables crisscrossing the planet, Internet connectivity for almost everyone, and servers that can be had for virtually nothing, this system not only looks perfectly feasible, but preferable in many ways to the status quo.

Here's the revolutionary part: each article that Molbank publishes is accompanied by a publicly-available, machine-readable file encoding the structure of the article's subject molecule. That's it. There's nothing tricky or high-tech about it. In fact, the practice is about as low-tech as you could imagine. The file format in which structures are encoded, molfile, dates back at least fifteen years, and nearly every piece of chemistry software - both end-user and developer tools - can handle it. What makes Molbank's practice revolutionary is that not a single chemistry journal, Open Access or subscription-based, currently does this.

Why does the simple inclusion of a publicly-available molfile encoding molecular structures in a paper matter so much? This is where the second two entities of the trinity named in this article's title come into play: Open Source and Open Data. By providing a mechanism for a computer to decipher the chemistry in a paper, Molbank has opened the door to a host of highly-productive integration activities that nobody outside of Chemical Abstract Service has even been able to contemplate, let alone prepare for.

This article is the first in a series aimed at exploring the wide-open space that Molbank has created. Rather than arguing my point with words, I'll actually build working demonstrations of what is now easily within reach. At the same time, I'll document my work on this blog. I'm not sure where all of this will end up, but I do hope to shine some light on a vital, although currently obscure, component of the Open Access debate.

Diversity-Oriented Chemical Informatics

Posted by Rich Apodaca Wed, 15 Nov 2006 20:03:00 GMT

How would you enumerate all of the molecules represented by a molecular formula? This question was recently posed to members of the Blue Obelisk mailing list. Formula-based exhaustive structure enumeration may seem on the surface to be just another esoteric problem. Nevertheless, playing with open, interactive software that can perform such enumerations can be a great source of new ideas for applications and unit tests.

The Chemistry Development Kit offers a fully-functional exhaustive structure enumerator through its GENMDeterministicGenerator class. This article will use GENMDeterministicGenerator through the Ruby CDK interface to generate color 2-D images for all molecules of a given molecular formula.

A Solution

The software described in this article will generate a collection of 2-D molecular PNG images based on a user-supplied molecular formula. When viewed in a file browser such as Windows Explorer or Konqueror, the output is visible as a matrix of images. The filename of each image is given by the SMILES string of the corresponding molecule. All molecules are enumerated, whether they look "reasonable" or not. As an example, consider a section of the output for 'C4H8ClNO', which looks like this on my system:

Enumerator: A Small Ruby Library

We'll create a small Ruby class to do most of the work. Save the following in a file called enum.rb:

require 'rubygems'
require_gem 'rcdk'
require 'rcdk/util'

jrequire 'org.openscience.cdk.structgen.deterministic.GENMDeterministicGenerator'
jrequire 'net.sf.structure.cdk.util.ImageKit'

class Enumerator

  def initialize(formula)
    @generator = Org::Openscience::Cdk::Structgen::Deterministic::GENMDeterministicGenerator.new(formula, '')
    @width = 150
    @height = 150
  end

  def set_size(width, height)
    @width = width
    @height = height
  end

  def write_images
    mols = @generator.getStructures
    iterator = mols.iterator

    while (iterator.hasNext)
      mol = RCDK::Util::XY.coordinate_molecule(iterator.next)
      smiles = RCDK::Util::Lang.get_smiles(mol)

      Net::Sf::Structure::Cdk::Util::ImageKit.writePNG(mol, @width, @height, "#{smiles}.png")
    end
  end
end 

As you can see, this class is nothing more than a thin wrapper around a large amount of CDK functionality. Most of the action happens in the write_images method, where three things take place:

  1. We retrieve a list of molecules from the GENMDeterministicGenerator instance that satisfy the molecular formula passed to Enumerator's constructor.

  2. These molecules are iterated.

  3. For each molecule, an image is written with the filename given by its SMILES string.

Testing the Library

To test the library, the following code can either be entered interactively via Interactive Ruby (irb) or saved to a file and run with the Ruby interpreter (ruby):

require 'enum'

e=Enumerator.new 'C4H8ClNO'

e.write_images

Running this code will produce a collection of PNG images in your working directory. By changing the argument passed to the Enumerator constructor, you can change the makeup of the image set.

Prerequisites

For this tutorial, you'll need Ruby CDK (RCDK). A recent article described the small amount of system configuration required for RCDK on Linux. Another article showed how to install RCDK on Windows.

Unexpected Behavior

After testing the Enumerator library, you may notice a new file in your working directory called structuredata.txt. This file is written automatically by GENMDeterministicGenerator on instantiation, providing information on each structure that is generated. The CDK API does not mention the creation of this file, and it would be preferable for this file to only created on request. I'll be submitting a feature request to this effect shortly.

Food for Thought

If you plan to explore larger areas of chemical space with the Enumerator library, be prepared to wait. The generation of molecules, determination of 2-D coordinates, and rendering can take some time. Of course, the number of molecules increases dramatically with the number of atoms in the molecular formula - a concrete demonstration of what makes organic chemistry the fascinating discipline that it is.

An interesting variation on the ideas presented here would be to filter out molecules based on some criteria. One approach would be to remove molecules containing reactive functionality such as nitrogen substituted with chorine. A SMARTS pattern search could easily form the basis for this filter. In applying this and similar filters, larger areas of interesting chemical space could be sampled in a reasonable amount of time.

Conclusions

CDK's GENMDeterministicGenerator class, when combined with 2-D structure layout and 2-D rendering, provides the foundation of an intriguing tool for exploring chemical diversity. Further combining this capability with that offered by other freely-available tools offers some thought-provoking possibilities.

Debabelization

Posted by Rich Apodaca Wed, 08 Nov 2006 19:32:00 GMT

Today, we find Chemical Abstracts with over two million compounds coded in a connectivity table system and ISI with close to a million compounds coded in WLN. The U.S. Patent Office has large files coded in the Hayward notation; the IDC has large numbers of compounds in its CT and GREMAS Code. Derwent has a sizable patent file coded in one fragment code, and many journal literature compounds coded in the Ring Code fragment code. There are a number of individual companies and government agencies with over 100,000 compounds coded in "a" system. And almost all companies synthesizing new compounds have some internal system for their compounds. Finally, there are many universities with a wide variety of coded structure files.

-Charles E. Granito J. Chem. Doc. 1973, 13, 72-74

The situation described by Granito in 1973 seems eerily familiar today. The names of the players, the technologies, and encoding systems have changed, but the problem of multiple incompatible molecular languages has persisted for over 30 years.

This problem will become even more pronounced in the near future as free chemistry databases on the Web continue their rapid proliferation. In Granito's world of closed, proprietary databases and unevenly distributed computer power, interoperability was an afterthought; in the coming world of free, open databases, and ubiquitous computer networks that connect to them, interoperability will be taken for granted.

Granito goes on to observe that "there is no one 'best' system" for molecular representation. And he's right. Molecular languages evolve within a particular problem domain, just as human languages evolve within a specific cultural context. This isn't to say that a molecular language can't be creatively adapted to serve purposes for which it was never designed. Trying to do so is, after all, how new languages are conceived.

Consider the case of InChI, which is both a molecular identification system and a line notation, or Chemical Markup Language (CML), an XML language. There are vast areas of chemistry in which using either InChI or CML will be problematic - particularly polymers, organometallics, and inorganic chemistry. And let's not ignore new molecular representation problems brewing on the horizon like small molecule tertiary structure. Yet for pure organic chemistry as most of us know it today, InChI and CML may well be optimal.

The problem is that both InChI and CML compete with simpler, entrenched alternatives - SMILES and molfile, respectively. Even MDL, the author of the original molfile specification, is having difficulty gaining acceptance for its new molfile format, despite significant technical advantages.

If history is any guide, we can look forward to at least as many molecular languages in the next thirty years as we've seen in the last thirty. It wasn't long ago that WLN was viewed as the language of the future. Now it just looks cryptic. For this we can thank a combination of technology advances and the emergence of a far simpler alternative, SMILES. A similar fate, more likely than not, awaits all molecular languages currently in use.

Will there ever be a universal molecular language and is there any point in trying to invent one? Every area of chemistry introduces its own peculiarities not shared by any of the others. Yet all users want the simplest language possible. These two contradictory forces ensure that a universal language is unlikely to ever appear. In other words, the most successful new molecular languages are likely to be agile - simple, easy to learn, cheap to implement, and quickly adaptable in the face of new chemical concepts and advances in computer technology.

OBRuby: A Ruby Interface to Open Babel

Posted by Rich Apodaca Tue, 31 Oct 2006 19:20:00 GMT

And the LORD said, Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do.

-Genesis 11:6

Open Babel is a widely-used Open Source chemical informatics toolkit written in C++. Although originally designed as a molecular language translator, Open Babel also supports SMARTS pattern recognition, molecular fingerprints, molecular superposition, and other features as well.

Open Babel currently offers interfaces for two scripting languages: Python and Perl. Recently, Geoff Hutchison and I have been working to add Ruby to that list. This article reports our success in doing so and provides a glimpse of what might now be possible.

OBRuby

The upcoming release of Open Babel (version 2.1.0) will come complete with a Ruby interface. For those interested in trying it out sooner, a package called OBRuby can be downloaded now. OBRuby compiles against revision 1577 of the Open Babel SVN trunk. It has been tested with Linux and Mac OS X, and will probably work on Windows with minor modifications. The approach outlined here is known to fail with Open Babel 2.0.2.

OBRuby is a technology demonstration. The Ruby scripting support included with Open Babel 2.1.0 may differ in some details from OBRuby. My purpose in this article is simply to demonstrate what is now possible. Please read through the install scripts (they're short) to be sure you're comfortable with what they do.

Here was my OBRuby installation process:

  1. Download the Open Babel SVN trunk revision 1577 or later.
  2. cd trunk
  3. configure, make, (as root) make install
  4. (as root) ldconfig (necessary on my system - perhaps not on yours)
  5. cd OBRUBY_DIR
  6. ruby build.rb
  7. (as root) make install

One last wrinkle: the build.rb script included with OBRuby is something of a hack. It hardcodes the location of the Open Babel library on line 6:

@@ob_dir='/usr/local'
Change this line to match your Open Babel installation and you should be ready to go. make install places a single file, openbabel.so into your Ruby site_ruby directory. To verify that the installation worked with IRB:
$ irb
irb(main):001:0> require 'openbabel'
=> true

A return value of true shows that the installation was successful. An error message about libopenbabel.so not being found indicates that your system can't find your Open Babel libraries. Be sure you've installed Open Babel and either run ldconfig or set LD_LIBRARY_PATH.

The majority of OBRuby was autogenerated by SWIG. A future article will detail how this was done - with an eye toward developing a Java interface to Open Babel.

Building an OBMol From SMILES

With installation out of the way, let's fire up OBRuby and take her for a test drive. The following code can either be entered with IRB or saved to a file and executed with the ruby interpreter:

require 'openbabel'
include OpenBabel

smi2mol = OBConversion.new
smi2mol.set_in_format("smi")

mol = OBMol.new
smi2mol.read_string(mol, 'CC(C)CCCC(C)C1CCC2C1(CCC3C2CC=C4C3(CCC(C4)O)C)C') # cholesterol, no chirality
mol.add_hydrogens

puts "Cholesterol has #{mol.num_atoms} atoms, including hydrogens."
puts "Its molecular weight is #{mol.get_mol_wt} and its molecular formula is #{mol.get_formula}."
This simple code illustrates some important points. All OBRuby classes reside in the OpenBabel module. These classes can be directly referenced by including the OpenBabel module. Also notice how Ruby underscore_delimited method names are used, rather than C++ UpperCamelCase names.

SMARTS Matching

One of the most useful features of Open Babel is its SMARTS pattern matching capability. This can conveniently be accessed from OBRuby by first instantiating an OBSmartsPattern, passing the SMARTS pattern of interest to the instance's init method, and retrieving the hit set:
require 'openbabel'
include OpenBabel

smi2mol = OBConversion.new
smi2mol.set_in_format("smi")

mol = OBMol.new
smiles = 'CC(C)CCCC(C)C1CCC2C1(CCC3C2CC=C4C3(CCC(C4)O)C)C' # cholesterol, no chirality
smi2mol.read_string(mol, smiles) 
mol.add_hydrogens

pattern=OBSmartsPattern.new
smarts = 'C1CCCCC1'

pattern.init(smarts)
pattern.match(mol)
hits = pattern.get_umap_list # => indicies of two cyclohexane rings

puts "Found #{hits.size} instances of the SMARTS pattern '#{smarts}' in the SMILES string #{smiles}. Here are the atom indices:"
hits.each_with_index do |hit, index|
  print "Hit #{index}: [ "

  hit.each do |atom_index|
    print "#{atom_index} "
  end

  puts "]"
end
Notice the Rubyesque each_with_index block that iterates over the elements in the hit set. Running the above code produces the following output:
Found 2 instances of the SMARTS pattern 'C1CCCCC1' in the SMILES string CC(C)CCCC(C)C1CCC2C1(CCC3C2CC=C4C3(CCC(C4)O)C)C. Here are the atom indices:
Hit 0: [ 12 17 16 15 14 13 ]
Hit 1: [ 20 25 24 23 22 21 ]

Finding Your Way

Using a new library like OBRuby can take some getting used to. An excellent source of information is OpenBabel's online API documentation. Another source is Ruby itself.

For example, let's say you've instantiated an OBMol, but can't remember the exact name of the method that counts the number of atoms. Just use Object.methods.sort:

require 'openbabel'

mol = OpenBabel::OBMol.new

mol.methods.sort # => see output below
When run from Interactive Ruby (irb), this code produces the following alphabetized list of methods, which I've truncated:
... "is_corrected_for_ph", "kekulize", "kind_of?", "method", "methods", "new_atom", "new_perceive_kekule_bonds", "new_residue", "next_atom", "next_bond", "next_conformer", "next_internal_coord", "next_residue", "nil?", "num_atoms", "num_bonds", "num_conformers", "num_edges", "num_hvy_atoms", "num_nodes", "num_residues", "num_rotors", "object_id", "perceive_bond_orders", "perceive_kekule_bonds", "private_methods", "protected_methods", "public_methods", "renumber_atoms", "reserve_atoms", "reset_visit_flags" ...

Conclusions

OBRuby combines the dynamic programming language Ruby with the highly-functional toolkit Open Babel. Further augmenting OBRuby's capabilities with the web application framework Rails and/or Ruby Chemistry Development Kit offers even more possibilities. Future articles will bring some of them to life.

Older posts: 1 2 3