Hacking Molbank: Downloading a Complete Chemistry Journal

Posted by Rich Apodaca Fri, 01 Dec 2006 20:13:00 GMT

The previous article in this series highlighted Molbank as a tool for studying the convergence of Open Access, Open Data, and Open Source in chemistry. This article will outline some of the technical and legal aspects of downloading and using Molbank content.

Mirror, Mirror

MDPI themselves actively encourage the copying of their journal content by a process known as mirroring:

We encourage two types of mirroring :

  • Institutional Mirroring : Institutions may help not only their own members, but neighbouring scientists, to have a faster and reliable access to MDPI journals. For institutions, this is a tradeoff : they save bandwidth on outgoing traffic, while having more inbound traffic. One positive aspect is that sites supporting mirrors become more visited and better known. We are going to maintain a list of supporting institutional mirror sites which is going to be presented in an extremely visible fashion, on the welcome pages of each journal, so that all MDPI readers can access the nearest site.
  • Personnal Mirroring : With hard disks becoming larger and cheaper, it becomes not unreasonnable to set up his/her own personnal mirror, with all the information at your fingertips !. An automated procedure, running at night, keeps your personnal mirror always updated. This is extremely convenient. You may keep this mirror to yourself, or openned to your colleagues, you may do what you wish !

The text then goes on to give explicit instructions on how to create a mirror of the entire MDPI site and all of its journal content using Linux. So not only does MDPI explicitly allow the non-commercial copying of their content, but that copy can then be hosted on the Web, transmitted through other media, or simply used locally. It's the latter of these uses that this article will address.

Create a Molbank Archive

The Unix command wget can be used to copy the content of any website. Before using wget, or any similar tool, you should check the robots.txt file for the site of interest. I have so far been unable to find a robots.txt file on the MDPI site, so I assume there is no problem with running either wget or other robotic agents. But for the purposes of this tutorial, it is more convenient to create a local copy.

To create a local copy of all 2005 articles in Molbank, for example, use wget with the appropriate arguments:

$ wget -r -l2 http://www.mdpi.net/molbank/molbank2005.htm

The -r flag turns on recursive directory retrieval, and the -l2 flag sets the retrieval depth to two.

When the process is complete, you should have a directory called www.mdpi.net in your working directory. This directory will contain a subdirectory called molbank which in turn contains two directories: 2005 and 2006. Under the 2005 directory, you'll find all of Molbank's articles in HTML format, all images, and all molfiles. It's not clear to me yet why the 2006 directory is created and why it only contains one article.

Checking the Archive

A large number of Molbank's molfiles appear to be corrupted. This isn't related to wget, because these files are also corrupted when viewed through a browser directly from http://www.mdpi.org. For example, the molfile for Molbank article #393 appears corrupted (as do all of the other molfiles for July 2005):

http://www.mdpi.org/molbank/molbank2005/m393.mol

You'll also find several instances of bogus molfiles containing only one or two atoms, such as for Molbank article #431:

http://www.mdpi.org/molbank/molbank2005/m431.mol

Some molfiles are missing altogether, such as the one for Molbank article #405:

http://www.mdpi.org/molbank/molbank2005/m405.mol

Clearly, the integrity of Molbank's molfiles can not be assumed. Software designed to work with this dataset will therefore need to be capable of gracefully handling corrupted, nonexistent, and bogus molfiles.

Conclusions

Molbank permits the non-profit copying of its entire article collection. With some simple command-line tools, it's possible to quickly and easily create your own personal Molbank mirror. A cursory examination of the molfiles contained in Molbank showed several problems that need to be taken into consideration. The remaining articles in this series will describe some ways that Molbank's content can be put to use with Open Source software, and mashed up with Open Data.

Anatomy of a Cheminformatics Web Application: Beautifying Depict

Posted by Rich Apodaca Mon, 27 Nov 2006 20:05:00 GMT

A recent article outlined the steps for building a Rails Web application that renders SMILES strings as 2-D molecular images. Although this application, Depict, performed its stated purpose, it was neither much to look at nor as easy to use as it could be. In this tutorial, we'll give Depict a face-lift and make it more user-friendly.

The Problem

As it now stands, Depict accepts a SMILES string as input, and then renders a new Web page containing a 2-D molecular image. We'd like to make it easier to enter multiple SMILES strings by combining data entry and image display into the same Web page. We'd also like to make the application as a whole look better by using Cascading Style Sheets and other UI enhancements.

Download and Prerequisites

For this tutorial, you'll need Ruby CDK (RCDK). A recent article described the small amount of system configuration required for RCDK on Linux. Another article showed how to install RCDK on Windows.

In addition, you'll need to install Ruby on Rails - something that can be done through RubyGems.

The Rails application that this tutorial starts with can be downloaded from this link. If you'd rather work with the version of Depict produced by applying the changes outlined in this tutorial, the full source code can be downloaded from this link.

Step 1: Consolidate Actions

Our first version of Depict defined three SmilesController actions: input; depict; and image_for. Because we want to show the molecular image on the same page on which SMILES input happens, we'll consolidate input and depict into a single action.

To do this, we'll edit depict/app/controllers/smiles_controller.rb by removing the input method and rewriting the depict and image_for methods:

require_gem 'rcdk'
require 'rcdk/util'

jrequire 'java.io.ByteArrayOutputStream'
jrequire 'net.sf.structure.cdk.util.ImageKit'
jrequire 'javax.imageio.ImageIO'

class SmilesController < ApplicationController

  # Consolidated depict method.
  def depict
    if params[:smiles]
      @smiles = params[:smiles][:value]
    else
      @smiles = ''
    end
  end

  def image_for
    smiles = params[:smiles]

    # Just return if we can't get a SMILES string.
    if !smiles
      return
    end

    mol = RCDK::Util::Lang.read_smiles smiles
    mol = RCDK::Util::XY::coordinate_molecule mol
    out=Java::Io::ByteArrayOutputStream.new
    image=Net::Sf::Structure::Cdk::Util::ImageKit.createRenderedImage(mol, 300, 300)

    Javax::Imageio::ImageIO.write(image, "png", out)

    send_data(out.toByteArray, :type => "image/png", :disposition => "inline", :filename => "molecule.png")
  end
end

We need to check whether the SMILES in image_for is nil because when the application is first lanched, no SMILES string is defined. By checking for this condition and exiting if found, our application can gracefully start up and respond to a blank input field.

We no longer need a View for the input action, the functionality of which we'll be moving into the View for our new depict action. Delete depict/app/views/smiles/input.rhtml, and edit depict/app/view/smiles/depict.rhtml so that is looks like the following:

<html>
  <head>
    <title>Depict</title>
  </head>
  <body>
    <h1>Depict a SMILES String</h1>
    <img src="<%= url_for :action => "image_for", :smiles => @smiles %>"></img><br />

    <%= form_tag :action=>'depict' %>
      <label>SMILES: </label>
      <%= text_field('smiles', 'value', :value => @smiles) %><br />
    <%= end_form_tag %>
  </body>
</html>

This new template is simply a combination of the two previous templates. Pointing your browser to http://localhost:3000/smiles/depict and entering a valid SMILES string should give a screen similar to the one below:

Step 2: Add a Helper for Serving Images

If a blank or invalid SMILES is entered, we'd like to give feedback by loading an image that reflects this condition. The user is expecting to see an image anyway, so we may as well put our error message there. To do so, we need to first re-think our image_for action.

Currently, image_for tries to generate an image from any string of characters. When it fails, no image is produced, giving rise to the familar "broken image" icon below:

We could add some conditional logic in our view that would detect an invalid or empty SMILES string. However, for several reasons such co-mingling of application code and HTML is generally considered a Bad Thing. Fortunately, Rails offers just what we need: Helpers. A helper is code contained in a module that is automatically included in a view.

Each Rails Controller comes complete with an associated Helper. Our SmilesHelper was already created and wired together for us when we created the SmilesController. All we need to do is to add our own Helper methods.

We're going to add a method called image_for_smiles that will return a URL to an image based on a SMILES string. It needs to handle three possible types of string input:

  • Blank SMILES: Returns a static URL to an image on our server indicating no SMILES string has been entered. We'll discuss where to put this image in Step 5.

  • Invalid SMILES: Returns a static URL to an image on our server indicating an invalid SMILES. We'll add this in Step 5.

  • Valid SMILES: Returns a dynamic URL that will generate a 2-D molecular image on the fly from binary data generated in the same manner as our current image_for action.

Let's add the functionality we need to our SmilesHelper, which is contained in the file depict/app/helpers/smiles_helper.rb:

# Load the RCDK library
require_gem 'rcdk'
require 'rcdk/util'

# New jrequire calls.
jrequire 'java.io.ByteArrayOutputStream'
jrequire 'net.sf.structure.cdk.util.ImageKit'
jrequire 'javax.imageio.ImageIO'

module SmilesHelper
  def image_for_smiles(param)
    smiles = param[:smiles]

    if smiles.eql? ''
      return '/images/blank.png'
    end

    render(smiles)
  end

  def render(smiles)
    begin    
      mol = RCDK::Util::Lang.read_smiles smiles
      mol = RCDK::Util::XY::coordinate_molecule mol
      image=Net::Sf::Structure::Cdk::Util::ImageKit.createRenderedImage(mol, 400, 400)
    rescue
      return '/images/invalid.png'
    end

    out = Java::Io::ByteArrayOutputStream.new

    Javax::Imageio::ImageIO.write(image, "png", out)

    flash[:bytes] = out.toByteArray
    flash[:smiles] = smiles

    url_for :action => 'image_for', :id => smiles
  end
end

Here, we introduce another Rails element - the flash. The flash provides temporary storage for data that needs to be passed from one Action to another. In the render method, we're storing the byte array created by Ruby CDK in the flash so that it can be sent into Depict's image window as dynamically-generated content.

If successful, the render method returns a URL of the form:

http://localhost:3000/smiles/image_for/SMILES

where SMILES is the escaped form of the user-specified SMILES string. If two images are served with exactly the same URL, some browsers (e.g., Konqueror) will assume they represent the same image and will re-use the image in their cache. So, we append the SMILES string to the URL as a way to get these browsers to refresh Depict's image area.

Step 3: Invoke the New image_for_smiles Method

We've added a new image_for_smiles method as a Helper, but Depict isn't yet using it. Let's change that by modifying the way that our image URL is generated in depict/app/views/smiles/depict.rhtml:

<html>
  <head>
    <title>Depict</title>
  </head>
  <body>
    <h1>Depict a SMILES String</h1>
    <img src="<%= image_for_smiles :smiles => @smiles %>"></img><br />

    <%= form_tag :action=>'depict' %>
      <label>SMILES: </label>
      <%= text_field('smiles', 'value', :value => @smiles) %><br />
    <%= end_form_tag %>
  </body>
</html>

Step 4: Simplify SmilesController

We're now no longer using SmilesController (depict/app/controllers/smiles_controller.rb) to perform the bulk of the work related to 2-D image generation. Let's update our Controller to reflect these changes:

# No libraries need to be loaded now.

class SmilesController < ApplicationController
  # Consolidated depict method.
  def depict
    if params[:smiles]
      @smiles = params[:smiles][:value]
    else
      @smiles = ''
    end
  end

  # Consolidated image_for method.
  def image_for
    if flash[:bytes]
      send_data(flash[:bytes], :type => "image/png", :disposition => "inline", :filename => "#{flash[:smiles]}.png")
    end
  end
end

Notice how much simpler the image_for method now is. The byte array saved in Rails' flash (introduced in Step 2) is simply sent out as a PNG image to any receiver requesting it.

Our application, when provided with a valid SMILES string, now looks like the image below.

Step 5: Add Static Images

We'd like to have Depict render an appropriate image in those cases where a molecular image can not be rendered. In fact, Depict is already configured to do so - all we need to do is add the images themselves.

Where do we put these images? Rails creates several directories when an application template is produced. One of these is called public. This directory in turn contains an images subdirectory. Currently, depict/public/images only contains the Rails logo. It is this directory into which static images are designed to go. Let's add these two images to depict/public/images: blank.png and invalid.png. You could, of course, create your own custom 400x400 pixel images for this purpose.

Deleting any SMILES input from Depict now should generate the screen shown below.

Not exactly subtle, but it gets the message across. A similar screen results by entering an invalid SMILES string, such as "hello".

Step 6: Create and Use a Cascading Style Sheet

We'd like to have fine-grained control over the appearance of our application through a single file - a job ideally suited for Cascading Style Sheets (CSS). Where do CSS files live in a Rails application? Along with the images directory described above, Rails also creates a public/stylesheets directory when an application template is generated. This is where custom style sheets can be placed. Create a CSS file called default.css in this directory containing the following definitions:

h1 {
    text-align: center;
    font-size: 30pt;
    background: #993333;
    color: white;
}

.image {
    margin-left: auto;
    margin-right: auto;
    width: 400px;
}

.smiles {
    margin-left: auto;
    margin-right: auto;
    width: 400px;
}

.smiles input {
    width: 100%;
    font-size: 18pt;
    text-align: center;
    border: solid #993333;
    border-width: 2px 2px 2px 2px;
}

.smiles label {
    background: #993333;
    color: white;
    padding: 4px;
    font: sans-serif;
    font-weight: bold;
}

.about {
    text-align: center;
    font-size: 16pt;
}

a:link,  a:visited { color: #930; }
a:hover, a:active {color: #FFFFFF; background: #993333;}

Next, we need to tell Rails where to find the above CSS. Open up depict/app/views/smiles/depict.rhtml and add the following eRuby line inside the head tags:

<%= stylesheet_link_tag "default", :media => "all" %>

That's all there is to it. Reloading Depict should give a screen similar to the one below.

Step 7: Clean Up the View

You may have noticed that the style sheet added in the previous step defines some features we're not currently using. Let's update Depict's View (depict/app/views/depict.rhtml) to reflect the changes to our CSS:

<html>
  <head>
    <title>Depict</title>
    <%= stylesheet_link_tag "default", :media => "all" %>
  </head>
  <body>
    <h1>Depict a SMILES String</h1>
    <div class="image">
      <img src="<%= image_for_smiles :smiles => @smiles %>"></img>
    </div>
    <br /><br />
      <div class="smiles">
      <%= form_tag :action=>'depict' %>
        <label>SMILES: </label>
        <%= text_field('smiles', 'value', :align=>'center', :value => @smiles) %><br />
      <%= end_form_tag %>
      </div>
  </body>

  <div class="about">
    <a href="http://depth-first.com/articles/2006/11/27/anatomy-of-a-cheminformatics-web-application-beautifying-depict">About this Application</a>
  </div>
</html>

The changes here consist of grouping related HTML elements together into div blocks and adding a link to the article you're reading at the bottom of the article. The interaction of the above code and the style sheet we created in Step 6 produces a screen, such as the one below, when a valid SMILES string is entered.

Summary

Even if you haven't followed along through this tutorial, it should be apparent that Rails is a powerful tool for the agile development of Web applications. Although we haven't used any sophisticated techniques, we now have a working Depict server with a simple, logical Web interface that does something useful.

But we're not quite done with Depict yet. Currently, you need to hit the return key to get a 2-D rendering. Wouldn't it be better if the application automatically updated the image as a SMILES string is typed? If you're thinking "Ajax", you're right on target.

Scripting Molecular Fingerprints with Ruby CDK

Posted by Rich Apodaca Wed, 22 Nov 2006 20:44:00 GMT

A molecular fingerprint represents a molecule as series of bits. There are many situations in which this reduced form of molecular representation is useful. For example, fingerprints are frequently used as a fast prescreen for database substructure searches. They can also be used for "fuzzy" comparisons involving molecular similarity, a nice complement to binary queries such as substructure search.

Fingerprints have their limitations. Being a form of hashing, they are imprecise in that two different molecules can have exactly the same fingerprint. The converse is also true: many molecular fingerprints exaggerate small differences between two molecules that most chemists would say are similar - for example between oxygen and sulfur analogs of the same structure.

Despite their limitations, the advantages of fingerprints make them useful in many situations. As a result, numerous fingerprinting systems have become popular. This tutorial will focus on creating and manipulating molecular fingerprints from Ruby using the Ruby Chemistry Development Kit (RCDK).

Prerequisites

For this tutorial, you'll need Ruby CDK (RCDK). A recent article described the small amount of system configuration required for RCDK on Linux. Another article showed how to install RCDK on Windows.

A Small Fingerprint Library

Let's build a small Ruby library for working with fingerprints. Place the following code into a file called fingerprint.rb in your working directory:

require 'rubygems'
require_gem 'rcdk'
require 'rcdk/util'

jrequire 'org.openscience.cdk.fingerprint.Fingerprinter'
jrequire 'org.openscience.cdk.similarity.Tanimoto'

# Molecule fingerprinting
class Fingerprinter
  def initialize
    @fingerprinter = Org::Openscience::Cdk::Fingerprint::Fingerprinter.new
  end

  def fingerprint(smiles)
    mol = RCDK::Util::Lang.read_smiles smiles

    fp = @fingerprinter.getFingerprint mol

    # Metaprogramming!
    fp.extend(Fingerprint)
  end
end

# BitSet comparison
module Fingerprint
  # Returns true of all of the bits set to true in this fingerprint are also set to true in the specified fingerprint
  def subset?(fingerprint)
    Org::Openscience::Cdk::Fingerprint::Fingerprinter.isSubset(fingerprint, self)
  end

  # Tanimoto similarity of this fingerprint and the specified fingerprint
  def tanimoto(fingerprint)
    Org::Openscience::Cdk::Similarity::Tanimoto.calculate(self, fingerprint)
  end
end

Of particular note is the use of Ruby's Object.extend method. This method allows a single instance of an object to be extended at runtime - a form of metaprogramming. In this case, we add the subset? and tanimoto methods for determining whether all of the bits in one fingerprint are present in another, and for determining similarity, respectively. We use this technique here because currently RJB doesn't provide the complete interface into Java classes that would be required to create a Ruby class that directly inherits from Java's BitSet class.

Testing the Library

Claritin (loratadine, left) and Clarinex (desloratadine, right) are two structurally-related antihistamines. Can we quantitate the degree of similarity between these two structures? Fingerprints provide one way. The following code creates fingerprints for the two structures, determines if one is the subset of another, and assigns a Tanimoto similarity value:

require 'fingerprint'

f = Fingerprinter.new

loratadine = f.fingerprint 'CCOC(=O)N1CCC(=C2C3=C(CCC4=C2N=CC=C4)C=C(C=C3)Cl)CC1'
desloratadine = f.fingerprint 'C1CC2=C(C=CC(=C2)Cl)C(=C3CCNCC3)C4=C1C=CC=N4'

puts "Loratadine is a subset of desloratadine: #{loratadine.subset? desloratadine}" # => false
puts "Desloratadine is a subset of loratadine: #{desloratadine.subset? loratadine}" # => true
puts "Tanimoto similarity of desloratadine and loratadine: #{loratadine.tanimoto desloratadine}" # => 0.895683467388153

Variations

CDK's Fingerprinter class returns an instance of the Java class BitSet. This BitSet can be further manipulated in Ruby. For example, to find the size (the total number of bits) of the BitSet, we could use:

loratadine.size # => 1024

Similarly, to find the number of bits set to true, we would use:

loratadine.cardinality # => 278

To print out a list of all bits set to true, we could use the toString method:

loratadine.toString # => "{2, 8, 11, 16, 18, 22, 32, 37, 38, 41, 42, 46, 47, 51, 57, 64, 65, 66, 69 ... }"

Conclusions

Fingerprints enable many useful and fast comparisons between molecules. The form of fingerprint we've used here is but one of possibilities offered by CDK. The next article in this series will discuss fingerprints in Open Babel using both Ruby and Python.

Build a Rails Cheminformatics Application in Thirty Minutes

Posted by Rich Apodaca Tue, 21 Nov 2006 20:06:00 GMT

A recent article highlighted the Web as a new cheminformatics platform. Advocacy is one thing, but a working, open, demo built with modern technologies is far more compelling. In the following tutorial we'll build a first-generation cheminformatics Web application using the Ruby on Rails framework and 100% Open Source components. We'll just cover the essentials here - look for future articles to discuss the underlying technology in more detail.

The Problem

Simplified Molecular Input Line Entry System (SMILES) is one of the most compact and easy-to-learn molecular representation systems ever developed. Part of a larger family of molecular languages called line notations, SMILES strings are always written as a single line of ASCII text. This makes them perfect in situations calling for data entry; witness their use in a wide range of new free online chemistry databases. This system typically works by a chemist drawing a structure in a graphical editor, copying a SMILES string from the editor, and pasting this string into a search window in the database application.

SMILES is a great language for computers, but not for chemists, who are trained to communicate through 2-D structure diagrams. Although SMILES strings can be decoded manually, this is a tedious and error-prone process, especially for SMILES encoding high degrees of branching and ring content. It's preferable for the computer to do this hard work for us, providing a perfectly laid-out 2-D structure diagram for use in debugging or inclusion in documents.

Depict is a Web application originally developed by Daylight for the conversion of SMILES strings into 2-D structure diagrams. Type a SMILES string into the form, press enter, and get a raster image of the encoded molecule. Daylight's Depict does a good enough job, but you can't change the interface or output. You also can't take the software apart to see how it works. Wouldn't it be great if you could?

About This Tutorial Series

This tutorial is the first in a series describing how to build a Depict server using 100% Open Source components. The application will accept a SMILES string in a Webpage text field, and then produce a 2-D structure diagram. It won't be designed for ease of use, appearance, or configurability - these improvements will be described in subsequent articles. When this application is finished, I'll deploy it on a Web server. At every step in this process, I'll provide enough detail for anyone to do the same.

It won't be necessary to finish every step yourself before you can work with the finished product. Near the beginning of each installment will be be a "Download and Prerequisites" section containing a link to the complete source code. Simply download this code and run it to see what it does.

Download and Prerequisites

For this tutorial, you'll need Ruby CDK (RCDK). A recent article described the small amount of system configuration required for RCDK on Linux. Another article showed how to install RCDK on Windows.

In addition, you'll need to install Ruby on Rails - something that can be done through RubyGems.

The complete Depict application can be downloaded from this link.

A Note on Ruby Java Bridge and AMD64 Linux Platforms

Our Depict application will use Ruby Java Bridge (RJB) as a Ruby interface to Java bytecode. Recently, a problem with RJB on AMD64-Linux was uncovered. This problem prevents third-party jarfiles from being loaded after Rails has been loaded.

In practice, this means that the command to start the Rails server (Step 2) needs to be prefixed with an assignment of LD_PRELOAD. You also need to make LD_LIBRARY_PATH point to your native Java libraries. On my platform, which is AMD64-Linux running Sun's JVM, the commands are:

$ export LD_LIBRARY_PATH=/usr/java/jdk1.5.0_09/jre/lib/amd64:/usr/java/jdk1.5.0_09/jre/lib/amd64/server
$ LD_PRELOAD=/usr/java/jdk1.5.0_09/jre/lib/amd64/libzip.so ruby script/server

If you get an "Internal Error" due to an "unknown exception" while running Depict, chances are good that you've hit the same problem. Starting the Rails server as above should resolve it.

Step 1: Create the Application

Getting started with Rails is as simple as issuing the rails command with the name of your application as an argument:

$ rails depict

Executing this command creates a complete Rails application template under the depict subdirectory in your working directory. You build your application by editing the files and directories that were generated.

Step 2: Start the Server

You can start the Depict application by running the included server script:

$ cd depict
$ ruby script/server
=> Booting WEBrick...
=> Rails application started on http://0.0.0.0:3000
=> Ctrl-C to shutdown server; call with --help for options
[2006-11-18 10:17:08] INFO  WEBrick 1.3.1
[2006-11-18 10:17:08] INFO  ruby 1.8.5 (2006-08-25) [x86_64-linux-gnu]
[2006-11-18 10:17:08] INFO  WEBrick::HTTPServer#start: pid=4036 port=3000

Let's see what Depict looks like so far. Point your browser to http://localhost:3000. You should see the following page:

Congratulations! You're now running Ruby on Rails.

Step 3: Create the SmilesController

Rails adapts the Model-View-Controller application paradigm to the Web. It also automates many of the steps in building models, views, and controllers. Let's create a controller to handle the manipulation of SMILES strings:

$ ruby script/generate controller Smiles
      exists  app/controllers/
      exists  app/helpers/
      create  app/views/smiles
      exists  test/functional/
      create  app/controllers/smiles_controller.rb
      create  test/functional/smiles_controller_test.rb
      create  app/helpers/smiles_helper.rb

Currently, SmilesController is just a skeleton:

class SmilesController < ApplicationController
end

Let's give SmilesController the ability to accept a SMILES string as input by adding an input method.

class SmilesController < ApplicationController
  def input

  end
end

Step 4: Create a Form

At this stage, pointing your browser to http://localhost:3000/smiles/input gives a screen containing an error message:

Rails is looking for view that doesn't exist, so let's create one. To your depict/app/views/smiles directory, add the following file, called input.rhtml:

<html>
  <head>
    <title>Enter a SMILES String</title>
  </head>
  <body>
    <%= form_tag :action=>'depict' %>
      Enter a SMILES String: <br />
      <%= text_field('smiles', 'value') %><br />
    <%= end_form_tag %>
  </body>
</html> 

This HTML view is an example of Ruby's templating mechanism, eRuby, which was discussed earlier in the context of converting SD files to HTML. In the template above, we've configured our form to invoke an action called depict when submitted. This action does not yet exist, but will be created in Step 5 below.

Now, pointing your browser to http://localhost:3000/smiles/input should give an input field:

Let's try it. Submitting the SMILES string for benzene gives the following error screen:

We haven't defined the depict action yet, a fact that Rails is communicating with this error message.

Have you noticed how we haven't had to restart the Rails Web server as we've made changes? This is but one of the many conveniences that makes Rails such a productive platform.

Step 5: Add a Depict Action

We need a way to pass a SMILES string from the Web page text field in which it's entered to our application and back to another view. To do this we'll add a depict method to depict/app/controllers/smiles_controller.rb:

def depict
  @smiles = @params[:smiles][:value]
end

Of course, our application still won't run properly because we haven't created a view for the new depict method to use. Let's do this by adding the following file, named depict.rb to the depict/app/views/smiles directory:

<html>
  <head>
    <title>Depict SMILES: <%= @smiles %></title>
  </head>
  <body>
    <h1>SMILES: <%= @smiles %></h1>
  </body>
</html>

Notice how the instance variable @smiles is available for use within the template.

Let's have a look at Depict so far. Pointing your browser to http://localhost:3000/smiles/input, entering the SMILES string for benzene, and pressing return produces the page show below:

So far, so good. We've been able to read user input from an HTML form and reprocess it into some simple HTML output. Now, lets render a 2-D molecular image to go with it.

Step 6: Generate the 2-D Image

We'll use a method called image_for, which we'll define shortly. The file depict/app/views/smiles/depict should look like this:

<html>
  <head>
    <title>Depict SMILES: <%= @smiles %></title>
  </head>
  <body>
    <h1>SMILES:<%= @smiles %></h1>
    <img src="<%= url_for :action => "image_for", :smiles => @smiles %>"></img>
  </body>
</html>

The added img tag is a placeholder for now. It loads an image dynamically generated from the image_for method, which we'll shortly add to SmilesController. We pass the SMILES string as a parameter.

The image_for method does all of the real work in the Depict application. It accepts a SMILES string as a parameter, and produces a laid-out 2-D color molecular image as output. The method uses a variety of functionality contained in the Java API itself, and in Ruby CDK.

In addition to an image_for method, we'll need to add some accessory code to make it work. Edit depict/app/controllers/smiles_controller.rb so that it looks like this:

# Load the RCDK library
require_gem 'rcdk'
require 'rcdk/util'

# New jrequire calls.
jrequire 'java.io.ByteArrayOutputStream'
jrequire 'net.sf.structure.cdk.util.ImageKit'
jrequire 'javax.imageio.ImageIO'

class SmilesController < ApplicationController

  # Already defined.
  def input

  end

  # Already defined.
  def depict
    @smiles = @params[:smiles][:value]
  end

  # New method.
  def image_for
    smiles = @params[:smiles]
    mol = RCDK::Util::Lang.read_smiles smiles
    mol = RCDK::Util::XY::coordinate_molecule mol
    out=Java::Io::ByteArrayOutputStream.new
    image=Net::Sf::Structure::Cdk::Util::ImageKit.createRenderedImage(mol, 300, 300)

    Javax::Imageio::ImageIO.write(image, "png", out)

    send_data(out.toByteArray, :type => "image/png", :disposition => "inline", :filename => "molecule.png")
  end
end

Let's test the application with a real-world example. The achiral SMILES string for Carmine is:

CC1=C2C(=CC(=C1C(=O)O)O)C(=O)C3=C(C2=O)C(=C(C(=C3O)O)C4C(C(C(C(O4)CO)O)O)O)O

Pointing your browser to http://localhost:3000/smiles/input and entering the above SMILES string produces a color 2-D image of the structure of the red food coloring:

Conclusions

Ruby on Rails is a fun and agile framework for rapid Web development. Although Depict isn't much to look at yet, it demonstrates many key Rails concepts. Several techniques could be used improve the application's look and usability. For example, we could use AJAX to depict SMILES strings as they are being typed - without the need to hit return. We could also provide options for changing image format, size, and color scheme. Future articles will describe these and other improvements.

Diversity-Oriented Chemical Informatics

Posted by Rich Apodaca Wed, 15 Nov 2006 20:03:00 GMT

How would you enumerate all of the molecules represented by a molecular formula? This question was recently posed to members of the Blue Obelisk mailing list. Formula-based exhaustive structure enumeration may seem on the surface to be just another esoteric problem. Nevertheless, playing with open, interactive software that can perform such enumerations can be a great source of new ideas for applications and unit tests.

The Chemistry Development Kit offers a fully-functional exhaustive structure enumerator through its GENMDeterministicGenerator class. This article will use GENMDeterministicGenerator through the Ruby CDK interface to generate color 2-D images for all molecules of a given molecular formula.

A Solution

The software described in this article will generate a collection of 2-D molecular PNG images based on a user-supplied molecular formula. When viewed in a file browser such as Windows Explorer or Konqueror, the output is visible as a matrix of images. The filename of each image is given by the SMILES string of the corresponding molecule. All molecules are enumerated, whether they look "reasonable" or not. As an example, consider a section of the output for 'C4H8ClNO', which looks like this on my system:

Enumerator: A Small Ruby Library

We'll create a small Ruby class to do most of the work. Save the following in a file called enum.rb:

require 'rubygems'
require_gem 'rcdk'
require 'rcdk/util'

jrequire 'org.openscience.cdk.structgen.deterministic.GENMDeterministicGenerator'
jrequire 'net.sf.structure.cdk.util.ImageKit'

class Enumerator

  def initialize(formula)
    @generator = Org::Openscience::Cdk::Structgen::Deterministic::GENMDeterministicGenerator.new(formula, '')
    @width = 150
    @height = 150
  end

  def set_size(width, height)
    @width = width
    @height = height
  end

  def write_images
    mols = @generator.getStructures
    iterator = mols.iterator

    while (iterator.hasNext)
      mol = RCDK::Util::XY.coordinate_molecule(iterator.next)
      smiles = RCDK::Util::Lang.get_smiles(mol)

      Net::Sf::Structure::Cdk::Util::ImageKit.writePNG(mol, @width, @height, "#{smiles}.png")
    end
  end
end 

As you can see, this class is nothing more than a thin wrapper around a large amount of CDK functionality. Most of the action happens in the write_images method, where three things take place:

  1. We retrieve a list of molecules from the GENMDeterministicGenerator instance that satisfy the molecular formula passed to Enumerator's constructor.

  2. These molecules are iterated.

  3. For each molecule, an image is written with the filename given by its SMILES string.

Testing the Library

To test the library, the following code can either be entered interactively via Interactive Ruby (irb) or saved to a file and run with the Ruby interpreter (ruby):

require 'enum'

e=Enumerator.new 'C4H8ClNO'

e.write_images

Running this code will produce a collection of PNG images in your working directory. By changing the argument passed to the Enumerator constructor, you can change the makeup of the image set.

Prerequisites

For this tutorial, you'll need Ruby CDK (RCDK). A recent article described the small amount of system configuration required for RCDK on Linux. Another article showed how to install RCDK on Windows.

Unexpected Behavior

After testing the Enumerator library, you may notice a new file in your working directory called structuredata.txt. This file is written automatically by GENMDeterministicGenerator on instantiation, providing information on each structure that is generated. The CDK API does not mention the creation of this file, and it would be preferable for this file to only created on request. I'll be submitting a feature request to this effect shortly.

Food for Thought

If you plan to explore larger areas of chemical space with the Enumerator library, be prepared to wait. The generation of molecules, determination of 2-D coordinates, and rendering can take some time. Of course, the number of molecules increases dramatically with the number of atoms in the molecular formula - a concrete demonstration of what makes organic chemistry the fascinating discipline that it is.

An interesting variation on the ideas presented here would be to filter out molecules based on some criteria. One approach would be to remove molecules containing reactive functionality such as nitrogen substituted with chorine. A SMARTS pattern search could easily form the basis for this filter. In applying this and similar filters, larger areas of interesting chemical space could be sampled in a reasonable amount of time.

Conclusions

CDK's GENMDeterministicGenerator class, when combined with 2-D structure layout and 2-D rendering, provides the foundation of an intriguing tool for exploring chemical diversity. Further combining this capability with that offered by other freely-available tools offers some thought-provoking possibilities.

Older posts: 1 ... 28 29 30 31 32 ... 36