Building Chempedia: Start Simple, Then Iterate 3

Posted by Rich Apodaca Tue, 13 May 2008 15:38:00 GMT

As a medium for building software, the Web offers unparalleled adaptability. With nothing to download or install, users of Web applications automatically see the newest version - always. This may sound like a small thing, and technically it is. But it dramatically increases the effectiveness with which software can be created. The previous article in this series introduced Chempedia, the free Chemical encyclopedia and cheminformatics Web application. This article will discuss the process by which Chempedia will become a better service over time.

Iterative Web Application Development

Chempedia, like all actively-developed software, is a work in progress. It will be built in stages starting with the addition of new features, followed by a round of user feedback, bug fixing, and stabilization. This will then be followed by the next major iteration, and so on.

This iterative design style is ideally suited for Web applications. Because the barrier to pushing out new versions is essentially non-existent, a Web application can evolve at a much more rapid rate than other kinds of software. Indeed, the first version of a Web application need only work well enough to prove a point.

One of the keys to iterative Web development is a technology framework designed to facilitate it. Chempedia is being developed with Ruby on Rails, a tool that enables Web developers to take full advantage of the iterative development style the Web makes possible.

Another key element of iterative Web development is users willing to explore the system and offer criticism. Evolution succeeds only when the environment stresses an ecosystem; the same is true in Web application development.

Chempedia will take full advantage of the evolutionary nature of Web application development. As features are added and (hopefully) use of the service grows, Chempedia will evolve in ways that are impossible to predict today.

What's Wrong With Chempedia?

If you happened to take a look at Chempedia last week (that version is now no longer visible), you probably noticed many, many things that needed improvement. Some concerns were in the areas of:

  • Navigation. Navigation works best when the right granularity of options is achieved. Chempedia's navigation system grouped both closely-related and dissimilar actions at the same level.

  • Metaphor. The initial idea behind Chempedia was to see what happened when PubChem's chemical structures were mashed up with Wikiepia articles, using CAS numbers as the common link. The site design reflected this, with no clear organizing principle other than mashup. However, after the initial demonstration of the success of this approach, it became clear that Chempedia was strikingly similar in both form and function to the Merck Index. Perhaps this should be used as a clue in deriving a better organizing principle.

  • Wikipedia integration. The old Chempedia site didn't make it nearly as convenient as is should be to create or edit compound monographs. Because Chempedia serves as a chemically-aware front-end for Wikipedia, the easier it is to get to Wikipedia from Chempedia, the better.

What Changed?

During the process of trying to fix Chempedia's problems, it became clear that a major redesign was in order. This consisted of:

  • Creating a landing page oriented toward search. Using the Merck Index as a metaphor suggested that Chempedia's landing page should be designed around search, not browsing - as it was originally designed.

  • Emphasizing compound monographs, not compounds. Chempedia's central organizing principle is now the Compound Monograph. One way this is seen is in the new URL structure, which makes it very easy to see where a Chempedia link is about to take you. For example, consider the URL for benzene. Another way this can be seen is in the inclusion of Compound Monographs lacking a chemical structure.

  • Designing a streamlined menu system. The main menu system has been broken down into just three main categories: Search; Browse; and Create. These headings refer to actions on Compound Monographs, again in line with their importance as an organizing principle.

  • Promoting better integration with Wikipedia. After experimenting with a few implementation possibilities, it is now possible to edit Wikipedia articles directly from the Chempedia site, thanks to the use of inline frame. Once again, this capability is tied to the Compound Monograph, from which editing and updating links are accessible.

  • Striving for comprehensive Wikipedia coverage. Wikipedia had far more compound monographs than could be found on Chempedia, 6,411 of them, to be precise. Chempedia now contains all of them, regardless of whether a chemical structure can be found based on a CAS number in PubChem. This includes inorganics, organometallics, polymers, mixtures, and polypeptides.

Miles to Go Yet

Chempedia is far from being finished. For example, you'll notice many instances in which a Compound Monograph is truncated. This arises from difficulties in parsing Wikipedia's Wikitext format (more on this later).

Ultimately, the full text of each Wikipedia article will be present on Chempedia rather than just the first introductory paragraph. But it will take a significant amount of work to ensure that each article's Wikitext entry can be parsed faithfully.

Chempedia allows search by CAS number, PubChem CID and exact title. Full-text searching is not yet implemented, nor is autocomplete search, both of which would greatly enhance the usability of the service.

Exact structure searching is made possible by the ChemWriter editor in combination with SHA-1 hashed InChIs. Substructure search and query atom search will ultimately be added, but for an encyclopedia containing relatively few molecules, most of which having trivial names, this isn't yet seen as being critical.

You'll notice many Monographs on Chempedia that have no structure information. Behind the scenes, Chempedia uses the 350,000+ CAS numbers now contained in the PubChem database to associate a chemical structure with a Wikipedia article. In the future, these associations will be made by Chempedia and Wikipedia users, which will allow every Chempedia small-molecule Monograph to have a structure associated with it. (It will also create a rather large, publicly-curated, open database of CAS numbers linked to chemical structures, but that's a story for another time).

Your Feedback is Essential

Finally, many of the changes made in this iteration were the result of conversions with chemists and developers. If you see something on Chempedia that just doesn't work for you, please don't be shy about saying so. Feedback is an essential ingredient in making Chempedia the best service it can be.

CampDepict: Building a Simple SMILES Depict Web Application With JRuby, Structure CDK, and Camping

Posted by Rich Apodaca Wed, 23 Apr 2008 15:16:00 GMT

Today's tribute to the power of simplicity comes by way of John Jaeger, who has built one of the simplest cheminformatics Web applications ever written. His creation, CampDepict, interactively produces a raster image of a 2D chemical structure given a SMILES string, not unlike Daylight's Depict application.

CampDepict uses the Ruby Web microframework Camping. From the README:

Camping is a web framework which consistently stays at less than 4kb of code. You can probably view the complete source code on a single page. But, you know, it‘s so small that, if you think about it, what can it really do?

The idea here is to store a complete fledgling web application in a single file like many small CGIs. But to organize it as a Model-View-Controller application like Rails does. You can then easily move it to Rails once you‘ve got it going.

John's application is loosely-based on the Rails Depict application first described in 2006 here on Depth-First. His code makes use of CDK and Structure CDK, and it runs on JRuby.

If you've ever been curious about what Ruby has to offer cheminformatics, CampDepict could be just the application to get your feet wet.

ChemWriter Now Available for Download

Posted by Rich Apodaca Mon, 14 Jan 2008 15:29:00 GMT

A 2D chemical structure editor is a key component in most cheminformatics systems. With an ever-increasing number of groups using the Web as a cheminformatics platform, the need for a structure editor built specifically around the capabilities and constraints of the Web becomes more apparent.

For the last several months, my company (Metamolecular, LLC) has been developing a 2D structure editor called ChemWriter(TM). It was created specifically to solve the problem of building interactive, chemically-enabled Web applications that look good and load fast.

You can now download a free, fully-functional, non-expiring copy of ChemWriter (the ChemWriter Starter Package) good for development and testing of your chemically-aware Web application. The Metamolecular Company Blog has the details.

Build a Rails Cheminformatics Application in Thirty Minutes

Posted by Rich Apodaca Tue, 21 Nov 2006 20:06:00 GMT

A recent article highlighted the Web as a new cheminformatics platform. Advocacy is one thing, but a working, open, demo built with modern technologies is far more compelling. In the following tutorial we'll build a first-generation cheminformatics Web application using the Ruby on Rails framework and 100% Open Source components. We'll just cover the essentials here - look for future articles to discuss the underlying technology in more detail.

The Problem

Simplified Molecular Input Line Entry System (SMILES) is one of the most compact and easy-to-learn molecular representation systems ever developed. Part of a larger family of molecular languages called line notations, SMILES strings are always written as a single line of ASCII text. This makes them perfect in situations calling for data entry; witness their use in a wide range of new free online chemistry databases. This system typically works by a chemist drawing a structure in a graphical editor, copying a SMILES string from the editor, and pasting this string into a search window in the database application.

SMILES is a great language for computers, but not for chemists, who are trained to communicate through 2-D structure diagrams. Although SMILES strings can be decoded manually, this is a tedious and error-prone process, especially for SMILES encoding high degrees of branching and ring content. It's preferable for the computer to do this hard work for us, providing a perfectly laid-out 2-D structure diagram for use in debugging or inclusion in documents.

Depict is a Web application originally developed by Daylight for the conversion of SMILES strings into 2-D structure diagrams. Type a SMILES string into the form, press enter, and get a raster image of the encoded molecule. Daylight's Depict does a good enough job, but you can't change the interface or output. You also can't take the software apart to see how it works. Wouldn't it be great if you could?

About This Tutorial Series

This tutorial is the first in a series describing how to build a Depict server using 100% Open Source components. The application will accept a SMILES string in a Webpage text field, and then produce a 2-D structure diagram. It won't be designed for ease of use, appearance, or configurability - these improvements will be described in subsequent articles. When this application is finished, I'll deploy it on a Web server. At every step in this process, I'll provide enough detail for anyone to do the same.

It won't be necessary to finish every step yourself before you can work with the finished product. Near the beginning of each installment will be be a "Download and Prerequisites" section containing a link to the complete source code. Simply download this code and run it to see what it does.

Download and Prerequisites

For this tutorial, you'll need Ruby CDK (RCDK). A recent article described the small amount of system configuration required for RCDK on Linux. Another article showed how to install RCDK on Windows.

In addition, you'll need to install Ruby on Rails - something that can be done through RubyGems.

The complete Depict application can be downloaded from this link.

A Note on Ruby Java Bridge and AMD64 Linux Platforms

Our Depict application will use Ruby Java Bridge (RJB) as a Ruby interface to Java bytecode. Recently, a problem with RJB on AMD64-Linux was uncovered. This problem prevents third-party jarfiles from being loaded after Rails has been loaded.

In practice, this means that the command to start the Rails server (Step 2) needs to be prefixed with an assignment of LD_PRELOAD. You also need to make LD_LIBRARY_PATH point to your native Java libraries. On my platform, which is AMD64-Linux running Sun's JVM, the commands are:

$ export LD_LIBRARY_PATH=/usr/java/jdk1.5.0_09/jre/lib/amd64:/usr/java/jdk1.5.0_09/jre/lib/amd64/server
$ LD_PRELOAD=/usr/java/jdk1.5.0_09/jre/lib/amd64/libzip.so ruby script/server

If you get an "Internal Error" due to an "unknown exception" while running Depict, chances are good that you've hit the same problem. Starting the Rails server as above should resolve it.

Step 1: Create the Application

Getting started with Rails is as simple as issuing the rails command with the name of your application as an argument:

$ rails depict

Executing this command creates a complete Rails application template under the depict subdirectory in your working directory. You build your application by editing the files and directories that were generated.

Step 2: Start the Server

You can start the Depict application by running the included server script:

$ cd depict
$ ruby script/server
=> Booting WEBrick...
=> Rails application started on http://0.0.0.0:3000
=> Ctrl-C to shutdown server; call with --help for options
[2006-11-18 10:17:08] INFO  WEBrick 1.3.1
[2006-11-18 10:17:08] INFO  ruby 1.8.5 (2006-08-25) [x86_64-linux-gnu]
[2006-11-18 10:17:08] INFO  WEBrick::HTTPServer#start: pid=4036 port=3000

Let's see what Depict looks like so far. Point your browser to http://localhost:3000. You should see the following page:

Congratulations! You're now running Ruby on Rails.

Step 3: Create the SmilesController

Rails adapts the Model-View-Controller application paradigm to the Web. It also automates many of the steps in building models, views, and controllers. Let's create a controller to handle the manipulation of SMILES strings:

$ ruby script/generate controller Smiles
      exists  app/controllers/
      exists  app/helpers/
      create  app/views/smiles
      exists  test/functional/
      create  app/controllers/smiles_controller.rb
      create  test/functional/smiles_controller_test.rb
      create  app/helpers/smiles_helper.rb

Currently, SmilesController is just a skeleton:

class SmilesController < ApplicationController
end

Let's give SmilesController the ability to accept a SMILES string as input by adding an input method.

class SmilesController < ApplicationController
  def input

  end
end

Step 4: Create a Form

At this stage, pointing your browser to http://localhost:3000/smiles/input gives a screen containing an error message:

Rails is looking for view that doesn't exist, so let's create one. To your depict/app/views/smiles directory, add the following file, called input.rhtml:

<html>
  <head>
    <title>Enter a SMILES String</title>
  </head>
  <body>
    <%= form_tag :action=>'depict' %>
      Enter a SMILES String: <br />
      <%= text_field('smiles', 'value') %><br />
    <%= end_form_tag %>
  </body>
</html> 

This HTML view is an example of Ruby's templating mechanism, eRuby, which was discussed earlier in the context of converting SD files to HTML. In the template above, we've configured our form to invoke an action called depict when submitted. This action does not yet exist, but will be created in Step 5 below.

Now, pointing your browser to http://localhost:3000/smiles/input should give an input field:

Let's try it. Submitting the SMILES string for benzene gives the following error screen:

We haven't defined the depict action yet, a fact that Rails is communicating with this error message.

Have you noticed how we haven't had to restart the Rails Web server as we've made changes? This is but one of the many conveniences that makes Rails such a productive platform.

Step 5: Add a Depict Action

We need a way to pass a SMILES string from the Web page text field in which it's entered to our application and back to another view. To do this we'll add a depict method to depict/app/controllers/smiles_controller.rb:

def depict
  @smiles = @params[:smiles][:value]
end

Of course, our application still won't run properly because we haven't created a view for the new depict method to use. Let's do this by adding the following file, named depict.rb to the depict/app/views/smiles directory:

<html>
  <head>
    <title>Depict SMILES: <%= @smiles %></title>
  </head>
  <body>
    <h1>SMILES: <%= @smiles %></h1>
  </body>
</html>

Notice how the instance variable @smiles is available for use within the template.

Let's have a look at Depict so far. Pointing your browser to http://localhost:3000/smiles/input, entering the SMILES string for benzene, and pressing return produces the page show below:

So far, so good. We've been able to read user input from an HTML form and reprocess it into some simple HTML output. Now, lets render a 2-D molecular image to go with it.

Step 6: Generate the 2-D Image

We'll use a method called image_for, which we'll define shortly. The file depict/app/views/smiles/depict should look like this:

<html>
  <head>
    <title>Depict SMILES: <%= @smiles %></title>
  </head>
  <body>
    <h1>SMILES:<%= @smiles %></h1>
    <img src="<%= url_for :action => "image_for", :smiles => @smiles %>"></img>
  </body>
</html>

The added img tag is a placeholder for now. It loads an image dynamically generated from the image_for method, which we'll shortly add to SmilesController. We pass the SMILES string as a parameter.

The image_for method does all of the real work in the Depict application. It accepts a SMILES string as a parameter, and produces a laid-out 2-D color molecular image as output. The method uses a variety of functionality contained in the Java API itself, and in Ruby CDK.

In addition to an image_for method, we'll need to add some accessory code to make it work. Edit depict/app/controllers/smiles_controller.rb so that it looks like this:

# Load the RCDK library
require_gem 'rcdk'
require 'rcdk/util'

# New jrequire calls.
jrequire 'java.io.ByteArrayOutputStream'
jrequire 'net.sf.structure.cdk.util.ImageKit'
jrequire 'javax.imageio.ImageIO'

class SmilesController < ApplicationController

  # Already defined.
  def input

  end

  # Already defined.
  def depict
    @smiles = @params[:smiles][:value]
  end

  # New method.
  def image_for
    smiles = @params[:smiles]
    mol = RCDK::Util::Lang.read_smiles smiles
    mol = RCDK::Util::XY::coordinate_molecule mol
    out=Java::Io::ByteArrayOutputStream.new
    image=Net::Sf::Structure::Cdk::Util::ImageKit.createRenderedImage(mol, 300, 300)

    Javax::Imageio::ImageIO.write(image, "png", out)

    send_data(out.toByteArray, :type => "image/png", :disposition => "inline", :filename => "molecule.png")
  end
end

Let's test the application with a real-world example. The achiral SMILES string for Carmine is:

CC1=C2C(=CC(=C1C(=O)O)O)C(=O)C3=C(C2=O)C(=C(C(=C3O)O)C4C(C(C(C(O4)CO)O)O)O)O

Pointing your browser to http://localhost:3000/smiles/input and entering the above SMILES string produces a color 2-D image of the structure of the red food coloring:

Conclusions

Ruby on Rails is a fun and agile framework for rapid Web development. Although Depict isn't much to look at yet, it demonstrates many key Rails concepts. Several techniques could be used improve the application's look and usability. For example, we could use AJAX to depict SMILES strings as they are being typed - without the need to hit return. We could also provide options for changing image format, size, and color scheme. Future articles will describe these and other improvements.