CampDepict: Building a Simple SMILES Depict Web Application With JRuby, Structure CDK, and Camping

Posted by Rich Apodaca Wed, 23 Apr 2008 15:16:00 GMT

Today's tribute to the power of simplicity comes by way of John Jaeger, who has built one of the simplest cheminformatics Web applications ever written. His creation, CampDepict, interactively produces a raster image of a 2D chemical structure given a SMILES string, not unlike Daylight's Depict application.

CampDepict uses the Ruby Web microframework Camping. From the README:

Camping is a web framework which consistently stays at less than 4kb of code. You can probably view the complete source code on a single page. But, you know, it‘s so small that, if you think about it, what can it really do?

The idea here is to store a complete fledgling web application in a single file like many small CGIs. But to organize it as a Model-View-Controller application like Rails does. You can then easily move it to Rails once you‘ve got it going.

John's application is loosely-based on the Rails Depict application first described in 2006 here on Depth-First. His code makes use of CDK and Structure CDK, and it runs on JRuby.

If you've ever been curious about what Ruby has to offer cheminformatics, CampDepict could be just the application to get your feet wet.

User-Created Compound Monographs on Chempedia.net: Open Sourcing the Collation and Indexing of Chemical Information 13

Posted by Rich Apodaca Thu, 17 Apr 2008 21:50:00 GMT

Printed encyclopedias of chemical information like the Merck Index suffer from the problem of becoming obsolete on publication. When new compounds are discovered, or when the information about a compound changes, those changes can take many months or years to appear in print form due to the high cost of publication. It doesn't have to be that way. This article introduces a new feature to the free online chemical encyclopedia Chempedia that lets working scientists update is contents via Wikipedia.

About Chempedia.net

A recent article introduced Chempdia, the free online chemical encyclopedia. This service is built on two of the largest free and open repositories of chemical information in existence: Wikipedia and PubChem. PubChem supplies low-level chemical information such as connection tables, and Wikipedia supplies free-text descriptions of the properties and uses of certain molecules.

Which Molecules?

Currently, Chempedia.net only includes compound monographs for about 1,000 of its over 300,000 molecules. These monographs were located by a manual process in which the titles for all Wikipedia articles were downloaded in alphabetized form; this process clustered titles that represented IUPAC nomenclature due to its use of leading numbers and symbols. IUPAC nomenclature titles were extracted, and then a script was written to extract the chemical information from these titles and combine it with that from PubChem.

This method, although useful for getting a service running, is clearly flawed. The biggest problem is in how to discover new compound monographs.

Why Not Put Users in Control?

Chempedia users themselves are in the best position to know when an existing Wikipedia compound monograph should appear in Chempedia but doesn't, when an existing monograph needs to be updated, or when a new monograph is written and needs to be linked.

How can the process be automated?

As a partial answer to this question, users now have the ability to notify Chempedia of any changes to a Wikipedia compound monograph, and to have those changes immediately reflected in the next viewing of a Chempedia compound monograph.

An Example

As an example, let's take anandamide, a compound I've had some experience with during my time as a medicinal chemist. Although the Chempedia entry for ananandamide exists, there is (or as of today - was) no link to the Wikipedia compound monograph. Let's create one.

At the top of Chempedia's main menu, you'll see a link titled 'Update'. Choosing this link leads to a form that will ask for two pieces of information: (1) the title of the Wikipedia article to which you want Chempedia to link - in this case 'anandamide'; and (2) reCaptcha text to keep robots from making mischief.

Submitting this information is all that's needed to create a new or updated link from Chempedia to Wikipedia. Chempedia handles the rest.

Conclusions

Wikipedia is a vast source of free, high-quality, semi-structured chemical information just waiting to have good chemically-aware interfaces applied to it. Chempedia.net is an attempt to do just that, but it's a bit more as well. Although it may appear that Chempedia is the major beneficiary in this relationship, Wikipedia also benefits. When chemists have a tool that allows them to query and visualize Wikipedia using their native language (the chemical structure) they're in a better position to both use and contribute to Wikipedia itself - something I've started to do.

This positive feedback effect is the real value of exposing Web services. The question is: who in cheminformatics is willing and able to take the risk to discover this simple principle and its benefits?

Chempedia.net: Mashing Up PubChem and Wikipedia 12

Posted by Rich Apodaca Fri, 04 Apr 2008 14:06:00 GMT

PubChem and Wikipedia represent two of the largest open repositories of chemical information in the world. And they complement each other very nicely. PubChem contains mainly low-level chemical structure information whereas Wikipedia contains free-text descriptions of chemical compounds in the form of compound monographs.

Both services offer permission and access to copy and reuse their contents. But neither service is, by itself, nearly as useful as it could be.

Why not mash them up?

To explore that question my company, Metamolecular, LLC has launched Chempedia.

To my knowledge, Chempedia represents the first publicly-facing database of compounds to incorporate Wikipedia's collection of organic compound monographs. And it's one of the few cheminformatics services to make use of free-text descriptions generated by individual chemists.

Chempedia has been somewhat selective about the compounds it includes. To date, it has spidered over 2,500 monographs, combining them with over 300,000 of the most interesting compounds from PubChem. Not every Chempedia.net molecule has a monograph, but now there's a tool that can actually make that absence apparent.

Chempedia is both an experiment and a service. It's immediately useful for anyone in the business of making or doing things with organic molecules. It's created several unexpected moments of "Oh, that's actually a useful molecule!" It also will serve as a platform to test some of the ideas discussed in Depth-First over the last year or so on the advantages of the Web for collaboration in chemistry.

Stay tuned for more details about how Chempedia was created and some of its applications in chemistry.

Wikipedia for Cheminformatics: A Simple Web API for Finding CAS Numbers in Compound Monographs 4

Posted by Rich Apodaca Wed, 02 Apr 2008 21:29:00 GMT

Good news for cheminformatics: Chemical Abstracts Service (CAS) has agreed to help Wikipedia users curate its collection of CAS numbers. As a result of the diligence of some hard-working volunteers, chemistry's most universal system for referring to chemicals can now be used far more effectively by the worlds biggest open repository of knowledge.

Wouldn't it be great to be able to pull these CAS numbers from Wikipedia programmatically?

Perspective

Estimates place the number of Wikipedia pages dealing with individual inorganic and organic substances in the thousands. (I'll use the term "compound monographs" to describe them.) One factor acting to keep this number low is poor visibility of these entries. Unlike most chemical databases, Wikipedia can't, by itself, be easily searched by structure. As chemically-aware tools for indexing Wikipedia begin to emerge, look for six things to happen:

  1. The number of Wikipedia compound monographs will increase significantly.
  2. The quality of monographs for intermediate- to well-known compounds will increase substantially.
  3. Demand for user-friendly interfaces to Wikipedia's chemical content will increase.
  4. Wikipedia users will become interested in storing and finding ever more diverse kinds of information about each compound.
  5. Bench chemists will start to include Wikipedia as one of their preferred literature search techniques, leading to...
  6. More creative tools for using the chemical content of Wikipedia.

As noted previously, it wasn't too long ago that indexing of the chemical literature was done solely by volunteers. Wikipedia offers an intriguing way to channel the innate drive for chemists to combine their own work and experience with that of others to build useful information tools for the community.

But for now we are left with the question of how to index the chemical content of Wikipedia. Although a few systems have been proposed, the only practical method is through the use of CAS numbers. Which brings us to the subject of today's tutorial.

A Quick CAS Number API for Wikipedia

The Ruby program below will accept the title of any Wikipedia compound monograph title and return the CAS number for the compound being discussed, or an error message if none was found:

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'cgi'

class Wikikemi
  @cas = nil

  attr_reader :cas

  def initialize title
    uri = URI.escape("http://en.wikipedia.org/wiki/#{title}")
    puts "loading... #{uri}"
    doc = Hpricot(open(uri))
    table = (doc/"table")[0]

    table.inner_html.match(/([0-9]{2,7}?\-[0-9]{2}\-[0-9])/) if table

    @cas = $1
  end
end

# Returns the CAS number present in the Wikipedia monograph with
# the indicated title, or an error message if none is found. Try, for example,
# "benzene.".
while true
  puts "Enter the title of the Wikipedia page, for example: 'benzene'"
  monograph_title = gets.chomp
  w = Wikikemi.new monograph_title
  puts w.cas ? "[#{w.cas}]" : "CAS number not found"
end

This program makes use of the excellent Ruby HTML parser, Hpricot.

Saving the above code to a file called wikikemi.rb, we can run it with:

$ ruby wikikemi.rb

For example, we can look up the CAS numbers for Ferrocene, Lipitor, or 1,2,3,4,4a,5,6,7,8,8a-Decahydronaphthalene:

$ ruby wikikemi.rb
Enter the title of the Wikipedia page, for example: 'benzene'
ferrocene
loading... http://en.wikipedia.org/wiki/ferrocene
[102-54-5]
Enter the title of the Wikipedia page, for example: 'benzene'
lipitor
loading... http://en.wikipedia.org/wiki/lipitor
[134523-00-5]
Enter the title of the Wikipedia page, for example: 'benzene'
1,2,3,4,4a,5,6,7,8,8a-Decahydronaphthalene
loading... http://en.wikipedia.org/wiki/1,2,3,4,4a,5,6,7,8,8a-Decahydronaphthalene
[91-17-8]

All this method requires is that the Wikipedia page lists the correct CAS number in its Drugbox or Chembox template. Fortunately, CAS has agreed to help make this happen.

Conclusions

A little Ruby code is all it takes to build a working CAS number lookup system using Wikipedia. Although this may be useful as a standalone tool, it becomes much more powerful when made part of a larger cheminformatics system. But that's a story for another time.

See also Antony Williams' announcement on CAS and Wikipedia.

NetBeans 6, Ruby, and Rails: A Surprisingly Effective Combination

Posted by Rich Apodaca Thu, 27 Mar 2008 17:46:00 GMT

For far too long Ruby has lacked a development environment that supported important features developers in other languages now take for granted: code completion; refactoring; platform-independence; and speed. Although NetBeans may not spring to mind when thinking of Rails IDEs, it should be at the top of the list for anyone interested in the subject.

Getting started with Ruby, Rails and NetBeans is as easy as downloading the installer and running it. If you later decide to add Java support to your installation (which is also excellent), that can be done by downloading and running the Java installer. You'll end up with a single IDE that supports both languages.

Code Completion

Although other IDEs support some form of Ruby code completion, NetBeans takes it to another level. Can't remember the exact name of the method you're looking for? Type the period and let NetBeans look up both the name and documentation for you:

Hitting return enters the method and creates a template for parameters and any needed blocks.

Refactoring

One of the things that makes Java such a powerful language for large projects is the refactoring support offered by most IDEs. NetBeans brings this power to Ruby. Need to rename a class, method, or variable? Let NetBeans do it for you:

Conclusions

There's much more to NetBeans 6 and Ruby/Rails than what's been shown here, including formatting/highlighting for JavaScript and CSS, user-definable Ruby/JRuby interpreter, and menu-based script execution. Whether you're looking for a way to get started with using Ruby and Rails or a way to become more efficient at it, NetBeans 6 is well worth the time.

Older posts: 1 2 3 4 ... 32