Updating Ruby CDK 2

Posted by Rich Apodaca Thu, 07 May 2009 14:58:00 GMT

It's been over two years since Ruby CDK was first announced. For the unfamiliar, Ruby CDK offers a convenient way to use the Chemistry Development Kit from the MRI Ruby implementation.

Over the least several months, I've received quite a few emails about Ruby CDK. Many of the questions and comments revolve around the library now being out of date with both Ruby and CDK. Unfortunately, my priorities have moved in a different direction, leaving little time to maintain Ruby CDK.

So I was very happy to hear from Sebastian Klemm, who took on the job of updating Ruby CDK.

The result is the Ruby CDK repository on GitHub. In addition to updating Ruby CDK, Sebastian also updated one of its core components - Structure CDK. This Structure CDK fork is available via GitHub.

If you need to use CDK from Ruby, or you're interested in an alternative to CDK's built-in renderer, I encourage you to check out Sebastian's work.

Exhaustive Ring Perception With MX 1

Posted by Rich Apodaca Thu, 08 Jan 2009 18:16:00 GMT

The latest release of MX now supports exhaustive ring perception. Both a platform-independent jarfile and source distribution can be downloaded.

Background

The ability to perceive all rings in a chemical structure is essential for a number of important cheminformatics capabilities including Structure Diagram Generation, aromaticity detection, and binary fingerprint generation.

A recent Depth-First article described a ring-perception algorithm that efficiently returns the set of all rings for any molecule. The algorithm, developed by Hanser and coworkers has now been implemented in MX.

MX is a platform-independent, cross-language cheminformatics toolkit written in Java and made available to the cheminformatics community by Metamolecular, LLC.

Examples

Ring perception can be tested conveniently using either JRuby or Jython. In these examples, we'll use JRuby.

To find all rings in benzene, we'd use something like:

$ jirb
irb(main):001:0> require 'mx-0.108.1.jar'                         
=> true
irb(main):002:0> import com.metamolecular.mx.ring.HanserRingFinder
=> Java::ComMetamolecularMxRing::HanserRingFinder
irb(main):003:0> import com.metamolecular.mx.io.Molecules         
=> Java::ComMetamolecularMxIo::Molecules
irb(main):004:0> benzene = Molecules.create_benzene               
=> #<Java::ComMetamolecularMxModel::DefaultMolecule:0x1971eb3 @java_object=com.metamolecular.mx.model.DefaultMolecule@126ba64>
irb(main):005:0> finder = HanserRingFinder.new                    
=> #<Java::ComMetamolecularMxRing::HanserRingFinder:0x76f2e8 @java_object=com.metamolecular.mx.ring.HanserRingFinder@1458dcb>
irb(main):006:0> rings = finder.find_rings benzene                
=> #<Java::JavaUtil::ArrayList:0x1b83048 @java_object=[[com.metamolecular.mx.model.DefaultMolecule$AtomImpl@169dd64, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@145f5e3, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@122d9c, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@170984c, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@11ed166, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@45aa2c, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@169dd64]]>
irb(main):007:0> rings[0].collect{|atom| atom.get_index}.join("-")
=> "5-0-1-2-3-4-5"
irb(main):008:0> rings.size
=> 1

Here, we're taking advantage of the Ruby Array.join function to place a dash between each atom index.

To really push the system, we could find all rings in cubane:

$ jirb
irb(main):001:0> require 'mx-0.108.1.jar'                         
=> true
irb(main):002:0> import com.metamolecular.mx.ring.HanserRingFinder
=> Java::ComMetamolecularMxRing::HanserRingFinder
irb(main):003:0> import com.metamolecular.mx.io.Molecules         
=> Java::ComMetamolecularMxIo::Molecules
irb(main):004:0> cubane = Molecules.create_cubane                 
=> #<Java::ComMetamolecularMxModel::DefaultMolecule:0xe391c4 @java_object=com.metamolecular.mx.model.DefaultMolecule@182a033>
irb(main):005:0> finder = HanserRingFinder.new                    
=> #<Java::ComMetamolecularMxRing::HanserRingFinder:0x1458dcb @java_object=com.metamolecular.mx.ring.HanserRingFinder@1603522>
irb(main):006:0> rings = finder.find_rings cubane                 
=> #collection with many objects
irb(main):007:0> rings.size                                       
=> 28
irb(main):008:0> rings[0].collect{|atom| atom.get_index}.join("-")
=> "3-0-1-2-3"

Other Improvements

The MX-0.108.1 release includes some other changes as well.

  • Fixes a bug in which multiline SD file data was not read.

  • Adds a resources directory containing atomic_system.xml so that the source distribution can compile and all tests will pass.

Conclusions

This first implementation of the Hanser algorithm focuses on correctness, readability, and test coverage over performance. Future releases will address performance in the context of a open, multi-toolkit cheminformatics benchmark suite.

Flexible Depth-First Search With MX 2

Posted by Rich Apodaca Wed, 26 Nov 2008 16:13:00 GMT

Graph theory is an essential component of cheminformatics, if you dig deeply enough. MX is a lightweight cheminformatics toolkit written in Java with a major goal of exposing the most important cheminformatics graph manipulations in a flexible, Java-centric way. Previous releases have focused on implementing subgraph monomorphism functionality for use in substructure search. The new MX release, 0.104.0, introduces support for depth-first traversal. This article will give a simple example using this feature.

Downloading MX

MX can be downloaded in source or binary form:

Scripting MX with JRuby

A previous article outlined the simple steps needed to install JRuby on unix-based systems for scripting MX.

Finding All Paths From a Given Atom

A fundamental graph operation in cheminformatics is finding all paths through a molecule from a starting atom. MX makes this easy with the com.metamolecular.mx.path.PathFinder class. Depth-first traversal is used in creating molecular fingerprints. Another use is in creating SMILES strings, although a limited form of depth-first traversal is used in which each atom in a molecule is traversed only once.

We can create a short library to print out all of the paths through a molecule in JRuby:

require 'mx-0.104.0.jar'
import 'com.metamolecular.mx.path.PathFinder'

class PathPrinter
  def initialize
    @finder = PathFinder.new
  end

  def print_paths atom
    paths = @finder.find_all_paths atom

    puts "printing all paths through the molecule"

    paths.each do |path|
      print_path path
    end
  end

  private

  def print_path path
    path.each do |atom|
      print atom.get_index
      print '-' unless path.get(path.length - 1).equals(atom)
    end

    puts
  end
end

Saving the above code in a file called pathprinter.rb, we can test it from interactive JRuby:

$ jirb
irb(main):001:0> require 'pathprinter'                   
=> true
irb(main):002:0> import com.metamolecular.mx.io.Molecules
=> Java::ComMetamolecularMxIo::Molecules
irb(main):003:0> benzene=Molecules.create_benzene        
=> #<Java::ComMetamolecularMxModel::DefaultMolecule:0x43da1b @java_object=com.metamolecular.mx.model.DefaultMolecule@8a2023>
irb(main):004:0> p=PathPrinter.new                       
=> #<PathPrinter:0x19ed7e @finder=#<Java::ComMetamolecularMxPath::PathFinder:0x3727c5 @java_object=com.metamolecular.mx.path.PathFinder@1140709>>
irb(main):005:0> p.print_paths benzene.get_atom(0)       
printing all paths through the molecule
0-5-4-3-2-1
0-1-2-3-4-5
=> nil

How It Works

Two classes collaborate in this traversal: com.metamolecular.mx.path.PathFinder and com.metamolecular.mx.path.DefaultStep.

Creating a depth-first traversal of your own is as simple as creating a DefaultStep from an Atom and implementing a walk method similar to the one shown below:

public void walk(Step step)
{
  if (!step.hasNextBranch())
  {
    // do something with the completed branch

    return;
  }

  while(step.hasNextBranch())
  {
    Atom next = step.nextBranch();

    if (step.isBranchFeasible(next))
    {
      walk(step.nextStep(next));

      step.backTrack();
    }
  }
}

Conclusions

Depth-first traversal is an important tool in any cheminformatics library. MX offers an implementation of this traversal strategy that can be easily customized.

Fast Substructure Search Using Open Source Tools Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby

Posted by Rich Apodaca Wed, 29 Oct 2008 17:15:00 GMT

We can think of a fingerprint as a bucket into which every molecule in the universe can be reproducibly placed. Each molecule will belong to a single bucket, but each bucket may contain any number of molecules. In other words, there exists a one-to-many relationship between a fingerprint and its associated molecules. The previous article in this series discussed how to model this relationship using SQL. This article will take the idea one step further by describing one way to model this relationship in Ruby.

All Articles in this Series:

SQL Recap

So far, we've set up a fingerprints database:

mysql> describe fingerprints;
+--------+---------------------+------+-----+---------+----------------+
| Field  | Type                | Null | Key | Default | Extra          |
+--------+---------------------+------+-----+---------+----------------+
| id     | int(11)             | NO   | PRI | NULL    | auto_increment | 
| byte0  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte1  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte2  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte3  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte4  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte5  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte6  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte7  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte8  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte9  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte10 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte11 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte12 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte13 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte14 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte15 | bigint(64) unsigned | YES  |     | 0       |                | 
+--------+---------------------+------+-----+---------+----------------+
17 rows in set (0.00 sec)

This database contains a single (empty) fingerprint:

mysql> select * from fingerprints;
+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+
| id | byte0 | byte1 | byte2 | byte3 | byte4 | byte5 | byte6 | byte7 | byte8 | byte9 | byte10 | byte11 | byte12 | byte13 | byte14 | byte15 |
+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+
|  1 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |      0 |      0 |      0 |      0 |      0 |      0 | 
+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+
1 row in set (0.00 sec)

We've also set up a compounds database containing a foreign key (fingerprint_id) into the fingerprints table:

mysql> describe compounds;
+----------------+---------+------+-----+---------+----------------+
| Field          | Type    | Null | Key | Default | Extra          |
+----------------+---------+------+-----+---------+----------------+
| id             | int(11) | NO   | PRI | NULL    | auto_increment | 
| fingerprint_id | int(11) | YES  |     | NULL    |                | 
| smiles         | text    | YES  |     | NULL    |                | 
+----------------+---------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

In this hypothetical example, the compounds database is populated by two molecules, benzene and bromobenzene, both of which share the same fingerprint:

mysql> select * from compounds;
+----+----------------+------------+
| id | fingerprint_id | smiles     |
+----+----------------+------------+
|  1 |              1 | c1ccccc1   | 
|  2 |              1 | c1ccccc1Br | 
+----+----------------+------------+
2 rows in set (0.00 sec)

Adding the Ruby Layer

In Part 3, we created a CRUD API for fingerprints in Ruby. We now need to modify the class we created there, Fingerprint, to make it aware of the Compounds it will be associated with.

For brevity, you can view the updated Fingerprint class here. The main change has been to add a single line of code that tells Fingerprint that it's now associated with a class called Compound:

  has_many :compounds
All that remains is to bring the Compound class into being:
require 'rubygems'
require 'active_record'
require 'fingerprint'

ActiveRecord::Base.establish_connection(
  :adapter    => 'mysql',
  :host       => 'localhost',
  :username   =>  'root',
  :password   =>  '',
  :database   =>  'compounds'
)

class Compound < ActiveRecord::Base
  belongs_to :fingerprint
end
The belongs_to line is the counterpart to Fingerprint's has_many line. Together, both Fingerprint and Compound create a system in which each Fingerprint can reference multiple Compounds and each Compound references one Fingerprint. Let's test this with interactive Ruby:
$ irb
irb(main):001:0> require 'fingerprint'
=> true
irb(main):002:0> f=Fingerprint.find 1
=> #<Fingerprint id: 1, byte0: 0, byte1: 0, byte2: 0, byte3: 0, byte4: 0, byte5: 0, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 0, byte11: 0, byte12: 0, byte13: 0, byte14: 0, byte15: 0>
irb(main):003:0> f.compounds
=> [#<Compound id: 1, fingerprint_id: 1, smiles: "c1ccccc1">, #<Compound id: 2, fingerprint_id: 1, smiles: "c1ccccc1Br">]

Looks good. Our code has made the correct association between a Fingerprint and its Compounds. What about the other way around?

$ irb
irb(main):001:0> require 'compound'
=> true
irb(main):002:0> c=Compound.find 1
=> #<Compound id: 1, fingerprint_id: 1, smiles: "c1ccccc1">
irb(main):003:0> c.fingerprint
=> #<Fingerprint id: 1, byte0: 0, byte1: 0, byte2: 0, byte3: 0, byte4: 0, byte5: 0, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 0, byte11: 0, byte12: 0, byte13: 0, byte14: 0, byte15: 0>

As expected, the first Compound became associated with the correct Fingerprint.

Conclusions

Our system can now store and query molecular fingerprints in a relational database. It also associates multiple compounds with each fingerprint.

We have a complete fingerprint screening system, but not a substructure search system.

What's missing? For one thing, we'd need a way to perform atom-by-atom searches (ABAS) of all candidate structures after the fingerprint screening process is complete. Recall that just because a query fingerprint matches a candidate fingerprint doesn't necessarily mean that a substructure match has been found.

We'd also need a way to conveniently get real compounds with real fingerprints into our database. Only then would we be able to test the chemical validity of substructure queries.

The remaining articles in this series will discuss approaches to each of these requirements.

Image Credit: leeechy

Fast Substructure Search Using Open Source Tools Part 4: Creating Fingerprints from Chemical Structures

Posted by Rich Apodaca Wed, 15 Oct 2008 14:42:00 GMT

The previous articles in this series have detailed the steps needed to build a working fingerprint screening system using nothing more than the open source tools MySQL, Ruby, and ActiveRecord. With this system we can create, read, update, and destroy fingerprints in persistent storage. Although the system meets all of the requirements of a fingerprint screening system, it isn't a substructure search system - yet. For that, we need a way to convert chemical structure representations into fingerprints. This article describes a very simple method for doing so.

All Articles in this Series:

A Ruby Fingerprinter in Eight Lines

Let's create a Fingerprinter class that's capable of converting a SMILES string into a Fingerprint that can be stored and queried. The Ruby code below makes use of Open Babel's babel command-line utility:

require 'fingerprint'

class Fingerprinter  
  def fingerprint_smiles smiles
    raw = %x[echo '#{smiles}' | babel -ismi -ofpt 2>/dev/null]
    bytes = raw.gsub(/>.*?\n/, '').gsub(/\n/, '').split

    Fingerprint.new.fill_bytes{|i| "#{bytes[2*i]}#{bytes[2*i+1]}".hex}
  end
end

This class takes advantage of Ruby's ability to interface directly with the command line through the %x operator in a way similar to that previously described for the cInChI command line tool. The babel output is then converted into a form suitable for use with our previously-defined Fingerprint class.

Although quite easy to implement, this approach may not work in every situation. For example, the fingerprint_smiles method opens the possibility that a malicious user could attempt to execute arbitrary shell commands by creating a mis-formed SMILES string. Windows users may need to adapt the code. But for trusted SMILES on Unix machines, this implementation works well and can be used in many different programming environments.

Testing the Fingerprinter

We can test the Fingerprinter through interactive Ruby (irb):
$ irb
irb(main):001:0> require 'lib/fingerprinter'
=> true
irb(main):002:0> fp=Fingerprinter.new
=> #<Fingerprinter:0xb7498038>
irb(main):003:0> f=fp.fingerprint_smiles 'c1ccccc1'
=> #<Fingerprint id: nil, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: nil>
irb(main):004:0> f.cardinality
=> 6
irb(main):005:0> f.bitstring
=> "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000100000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000"

As we previously saw, any Fingerprint we create can be stored and later retrieved from a MySQL database. If we've already stored the fingerprint for benzene it can be found with the following:

$ irb
irb(main):001:0> require 'lib/fingerprinter'
=> true
irb(main):002:0> fp=Fingerprinter.new
=> #<Fingerprinter:0xb74ae284>
irb(main):003:0> f=fp.fingerprint_smiles 'c1ccccc1'
=> #<Fingerprint id: nil, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: nil>
irb(main):004:0> Fingerprint.find_by_fingerprint f
=> #<Fingerprint id: 12687, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: "000000000000000000000000000002000000000000000000000...">

Conclusions

We now have the ability to create, store, and query fingerprints created from arbitrary SMILES strings. If there were a 1:1 relationship between molecules and fingerprints, we'd be nearly done. But things are not quite that simple. The next article in this series will show how to relate molecules to fingerprints.

Image Credit: adrenalin

Older posts: 1 2 3 ... 16