Updating Ruby CDK 2
It's been over two years since Ruby CDK was first announced. For the unfamiliar, Ruby CDK offers a convenient way to use the Chemistry Development Kit from the MRI Ruby implementation.
Over the least several months, I've received quite a few emails about Ruby CDK. Many of the questions and comments revolve around the library now being out of date with both Ruby and CDK. Unfortunately, my priorities have moved in a different direction, leaving little time to maintain Ruby CDK.
So I was very happy to hear from Sebastian Klemm, who took on the job of updating Ruby CDK.
The result is the Ruby CDK repository on GitHub. In addition to updating Ruby CDK, Sebastian also updated one of its core components - Structure CDK. This Structure CDK fork is available via GitHub.
If you need to use CDK from Ruby, or you're interested in an alternative to CDK's built-in renderer, I encourage you to check out Sebastian's work.
Exhaustive Ring Perception With MX 1
The latest release of MX now supports exhaustive ring perception. Both a platform-independent jarfile and source distribution can be downloaded.
Background
The ability to perceive all rings in a chemical structure is essential for a number of important cheminformatics capabilities including Structure Diagram Generation, aromaticity detection, and binary fingerprint generation.
A recent Depth-First article described a ring-perception algorithm that efficiently returns the set of all rings for any molecule. The algorithm, developed by Hanser and coworkers has now been implemented in MX.
MX is a platform-independent, cross-language cheminformatics toolkit written in Java and made available to the cheminformatics community by Metamolecular, LLC.
Examples
Ring perception can be tested conveniently using either JRuby or Jython. In these examples, we'll use JRuby.
To find all rings in benzene, we'd use something like:
$ jirb
irb(main):001:0> require 'mx-0.108.1.jar'
=> true
irb(main):002:0> import com.metamolecular.mx.ring.HanserRingFinder
=> Java::ComMetamolecularMxRing::HanserRingFinder
irb(main):003:0> import com.metamolecular.mx.io.Molecules
=> Java::ComMetamolecularMxIo::Molecules
irb(main):004:0> benzene = Molecules.create_benzene
=> #<Java::ComMetamolecularMxModel::DefaultMolecule:0x1971eb3 @java_object=com.metamolecular.mx.model.DefaultMolecule@126ba64>
irb(main):005:0> finder = HanserRingFinder.new
=> #<Java::ComMetamolecularMxRing::HanserRingFinder:0x76f2e8 @java_object=com.metamolecular.mx.ring.HanserRingFinder@1458dcb>
irb(main):006:0> rings = finder.find_rings benzene
=> #<Java::JavaUtil::ArrayList:0x1b83048 @java_object=[[com.metamolecular.mx.model.DefaultMolecule$AtomImpl@169dd64, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@145f5e3, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@122d9c, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@170984c, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@11ed166, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@45aa2c, com.metamolecular.mx.model.DefaultMolecule$AtomImpl@169dd64]]>
irb(main):007:0> rings[0].collect{|atom| atom.get_index}.join("-")
=> "5-0-1-2-3-4-5"
irb(main):008:0> rings.size
=> 1
Here, we're taking advantage of the Ruby Array.join function to place a dash between each atom index.
To really push the system, we could find all rings in cubane:
$ jirb
irb(main):001:0> require 'mx-0.108.1.jar'
=> true
irb(main):002:0> import com.metamolecular.mx.ring.HanserRingFinder
=> Java::ComMetamolecularMxRing::HanserRingFinder
irb(main):003:0> import com.metamolecular.mx.io.Molecules
=> Java::ComMetamolecularMxIo::Molecules
irb(main):004:0> cubane = Molecules.create_cubane
=> #<Java::ComMetamolecularMxModel::DefaultMolecule:0xe391c4 @java_object=com.metamolecular.mx.model.DefaultMolecule@182a033>
irb(main):005:0> finder = HanserRingFinder.new
=> #<Java::ComMetamolecularMxRing::HanserRingFinder:0x1458dcb @java_object=com.metamolecular.mx.ring.HanserRingFinder@1603522>
irb(main):006:0> rings = finder.find_rings cubane
=> #collection with many objects
irb(main):007:0> rings.size
=> 28
irb(main):008:0> rings[0].collect{|atom| atom.get_index}.join("-")
=> "3-0-1-2-3"
Other Improvements
The MX-0.108.1 release includes some other changes as well.
Fixes a bug in which multiline SD file data was not read.
Adds a resources directory containing atomic_system.xml so that the source distribution can compile and all tests will pass.
Conclusions
This first implementation of the Hanser algorithm focuses on correctness, readability, and test coverage over performance. Future releases will address performance in the context of a open, multi-toolkit cheminformatics benchmark suite.
Flexible Depth-First Search With MX 2
Graph theory is an essential component of cheminformatics, if you dig deeply enough. MX is a lightweight cheminformatics toolkit written in Java with a major goal of exposing the most important cheminformatics graph manipulations in a flexible, Java-centric way. Previous releases have focused on implementing subgraph monomorphism functionality for use in substructure search. The new MX release, 0.104.0, introduces support for depth-first traversal. This article will give a simple example using this feature.
Downloading MX
MX can be downloaded in source or binary form:
mx-0.104.0.jar Platform-independent bytecode.
mx-0.104.0-src.tar.gz Source code and regression tests.
Scripting MX with JRuby
A previous article outlined the simple steps needed to install JRuby on unix-based systems for scripting MX.
Finding All Paths From a Given Atom
A fundamental graph operation in cheminformatics is finding all paths through a molecule from a starting atom. MX makes this easy with the com.metamolecular.mx.path.PathFinder class. Depth-first traversal is used in creating molecular fingerprints. Another use is in creating SMILES strings, although a limited form of depth-first traversal is used in which each atom in a molecule is traversed only once.
We can create a short library to print out all of the paths through a molecule in JRuby:
require 'mx-0.104.0.jar'
import 'com.metamolecular.mx.path.PathFinder'
class PathPrinter
def initialize
@finder = PathFinder.new
end
def print_paths atom
paths = @finder.find_all_paths atom
puts "printing all paths through the molecule"
paths.each do |path|
print_path path
end
end
private
def print_path path
path.each do |atom|
print atom.get_index
print '-' unless path.get(path.length - 1).equals(atom)
end
puts
end
endSaving the above code in a file called pathprinter.rb, we can test it from interactive JRuby:
$ jirb irb(main):001:0> require 'pathprinter' => true irb(main):002:0> import com.metamolecular.mx.io.Molecules => Java::ComMetamolecularMxIo::Molecules irb(main):003:0> benzene=Molecules.create_benzene => #<Java::ComMetamolecularMxModel::DefaultMolecule:0x43da1b @java_object=com.metamolecular.mx.model.DefaultMolecule@8a2023> irb(main):004:0> p=PathPrinter.new => #<PathPrinter:0x19ed7e @finder=#<Java::ComMetamolecularMxPath::PathFinder:0x3727c5 @java_object=com.metamolecular.mx.path.PathFinder@1140709>> irb(main):005:0> p.print_paths benzene.get_atom(0) printing all paths through the molecule 0-5-4-3-2-1 0-1-2-3-4-5 => nil
How It Works
Two classes collaborate in this traversal: com.metamolecular.mx.path.PathFinder and com.metamolecular.mx.path.DefaultStep.
Creating a depth-first traversal of your own is as simple as creating a DefaultStep from an Atom and implementing a walk method similar to the one shown below:
public void walk(Step step)
{
if (!step.hasNextBranch())
{
// do something with the completed branch
return;
}
while(step.hasNextBranch())
{
Atom next = step.nextBranch();
if (step.isBranchFeasible(next))
{
walk(step.nextStep(next));
step.backTrack();
}
}
}Conclusions
Depth-first traversal is an important tool in any cheminformatics library. MX offers an implementation of this traversal strategy that can be easily customized.
Fast Substructure Search Using Open Source Tools Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby
We can think of a fingerprint as a bucket into which every molecule in the universe can be reproducibly placed. Each molecule will belong to a single bucket, but each bucket may contain any number of molecules. In other words, there exists a one-to-many relationship between a fingerprint and its associated molecules. The previous article in this series discussed how to model this relationship using SQL. This article will take the idea one step further by describing one way to model this relationship in Ruby.
All Articles in this Series:
- Part 1: Fingerprints and Databases
- Part 2: Fingerprint Screen With SQL
- Part 3: A CRUD API for Fingerprints in Ruby
- Part 4: Creating Fingerprints from Chemical Structures
- Part 5: Relating Molecules to Fingerprints with SQL
- Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby
SQL Recap
So far, we've set up a fingerprints database:
mysql> describe fingerprints; +--------+---------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------+---------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | byte0 | bigint(64) unsigned | YES | | 0 | | | byte1 | bigint(64) unsigned | YES | | 0 | | | byte2 | bigint(64) unsigned | YES | | 0 | | | byte3 | bigint(64) unsigned | YES | | 0 | | | byte4 | bigint(64) unsigned | YES | | 0 | | | byte5 | bigint(64) unsigned | YES | | 0 | | | byte6 | bigint(64) unsigned | YES | | 0 | | | byte7 | bigint(64) unsigned | YES | | 0 | | | byte8 | bigint(64) unsigned | YES | | 0 | | | byte9 | bigint(64) unsigned | YES | | 0 | | | byte10 | bigint(64) unsigned | YES | | 0 | | | byte11 | bigint(64) unsigned | YES | | 0 | | | byte12 | bigint(64) unsigned | YES | | 0 | | | byte13 | bigint(64) unsigned | YES | | 0 | | | byte14 | bigint(64) unsigned | YES | | 0 | | | byte15 | bigint(64) unsigned | YES | | 0 | | +--------+---------------------+------+-----+---------+----------------+ 17 rows in set (0.00 sec)
This database contains a single (empty) fingerprint:
mysql> select * from fingerprints; +----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+ | id | byte0 | byte1 | byte2 | byte3 | byte4 | byte5 | byte6 | byte7 | byte8 | byte9 | byte10 | byte11 | byte12 | byte13 | byte14 | byte15 | +----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+ | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+ 1 row in set (0.00 sec)
We've also set up a compounds database containing a foreign key (fingerprint_id) into the fingerprints table:
mysql> describe compounds; +----------------+---------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------+---------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | fingerprint_id | int(11) | YES | | NULL | | | smiles | text | YES | | NULL | | +----------------+---------+------+-----+---------+----------------+ 3 rows in set (0.00 sec)
In this hypothetical example, the compounds database is populated by two molecules, benzene and bromobenzene, both of which share the same fingerprint:
mysql> select * from compounds; +----+----------------+------------+ | id | fingerprint_id | smiles | +----+----------------+------------+ | 1 | 1 | c1ccccc1 | | 2 | 1 | c1ccccc1Br | +----+----------------+------------+ 2 rows in set (0.00 sec)
Adding the Ruby Layer
In Part 3, we created a CRUD API for fingerprints in Ruby. We now need to modify the class we created there, Fingerprint, to make it aware of the Compounds it will be associated with.
For brevity, you can view the updated Fingerprint class here. The main change has been to add a single line of code that tells Fingerprint that it's now associated with a class called Compound:
has_many :compoundsrequire 'rubygems'
require 'active_record'
require 'fingerprint'
ActiveRecord::Base.establish_connection(
:adapter => 'mysql',
:host => 'localhost',
:username => 'root',
:password => '',
:database => 'compounds'
)
class Compound < ActiveRecord::Base
belongs_to :fingerprint
end$ irb irb(main):001:0> require 'fingerprint' => true irb(main):002:0> f=Fingerprint.find 1 => #<Fingerprint id: 1, byte0: 0, byte1: 0, byte2: 0, byte3: 0, byte4: 0, byte5: 0, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 0, byte11: 0, byte12: 0, byte13: 0, byte14: 0, byte15: 0> irb(main):003:0> f.compounds => [#<Compound id: 1, fingerprint_id: 1, smiles: "c1ccccc1">, #<Compound id: 2, fingerprint_id: 1, smiles: "c1ccccc1Br">]
Looks good. Our code has made the correct association between a Fingerprint and its Compounds. What about the other way around?
$ irb irb(main):001:0> require 'compound' => true irb(main):002:0> c=Compound.find 1 => #<Compound id: 1, fingerprint_id: 1, smiles: "c1ccccc1"> irb(main):003:0> c.fingerprint => #<Fingerprint id: 1, byte0: 0, byte1: 0, byte2: 0, byte3: 0, byte4: 0, byte5: 0, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 0, byte11: 0, byte12: 0, byte13: 0, byte14: 0, byte15: 0>
As expected, the first Compound became associated with the correct Fingerprint.
Conclusions
Our system can now store and query molecular fingerprints in a relational database. It also associates multiple compounds with each fingerprint.
We have a complete fingerprint screening system, but not a substructure search system.
What's missing? For one thing, we'd need a way to perform atom-by-atom searches (ABAS) of all candidate structures after the fingerprint screening process is complete. Recall that just because a query fingerprint matches a candidate fingerprint doesn't necessarily mean that a substructure match has been found.
We'd also need a way to conveniently get real compounds with real fingerprints into our database. Only then would we be able to test the chemical validity of substructure queries.
The remaining articles in this series will discuss approaches to each of these requirements.
Image Credit: leeechy
Fast Substructure Search Using Open Source Tools Part 4: Creating Fingerprints from Chemical Structures
The previous articles in this series have detailed the steps needed to build a working fingerprint screening system using nothing more than the open source tools MySQL, Ruby, and ActiveRecord. With this system we can create, read, update, and destroy fingerprints in persistent storage. Although the system meets all of the requirements of a fingerprint screening system, it isn't a substructure search system - yet. For that, we need a way to convert chemical structure representations into fingerprints. This article describes a very simple method for doing so.
All Articles in this Series:
- Part 1: Fingerprints and Databases
- Part 2: Fingerprint Screen With SQL
- Part 3: A CRUD API for Fingerprints in Ruby
- Part 4: Creating Fingerprints from Chemical Structures
- Part 5: Relating Molecules to Fingerprints with SQL
- Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby
A Ruby Fingerprinter in Eight Lines
Let's create a Fingerprinter class that's capable of converting a SMILES string into a Fingerprint that can be stored and queried. The Ruby code below makes use of Open Babel's babel command-line utility:
require 'fingerprint'
class Fingerprinter
def fingerprint_smiles smiles
raw = %x[echo '#{smiles}' | babel -ismi -ofpt 2>/dev/null]
bytes = raw.gsub(/>.*?\n/, '').gsub(/\n/, '').split
Fingerprint.new.fill_bytes{|i| "#{bytes[2*i]}#{bytes[2*i+1]}".hex}
end
endThis class takes advantage of Ruby's ability to interface directly with the command line through the %x operator in a way similar to that previously described for the cInChI command line tool. The babel output is then converted into a form suitable for use with our previously-defined Fingerprint class.
Although quite easy to implement, this approach may not work in every situation. For example, the fingerprint_smiles method opens the possibility that a malicious user could attempt to execute arbitrary shell commands by creating a mis-formed SMILES string. Windows users may need to adapt the code. But for trusted SMILES on Unix machines, this implementation works well and can be used in many different programming environments.
Testing the Fingerprinter
We can test the Fingerprinter through interactive Ruby (irb):$ irb irb(main):001:0> require 'lib/fingerprinter' => true irb(main):002:0> fp=Fingerprinter.new => #<Fingerprinter:0xb7498038> irb(main):003:0> f=fp.fingerprint_smiles 'c1ccccc1' => #<Fingerprint id: nil, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: nil> irb(main):004:0> f.cardinality => 6 irb(main):005:0> f.bitstring => "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000100000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000"
As we previously saw, any Fingerprint we create can be stored and later retrieved from a MySQL database. If we've already stored the fingerprint for benzene it can be found with the following:
$ irb irb(main):001:0> require 'lib/fingerprinter' => true irb(main):002:0> fp=Fingerprinter.new => #<Fingerprinter:0xb74ae284> irb(main):003:0> f=fp.fingerprint_smiles 'c1ccccc1' => #<Fingerprint id: nil, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: nil> irb(main):004:0> Fingerprint.find_by_fingerprint f => #<Fingerprint id: 12687, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: "000000000000000000000000000002000000000000000000000...">
Conclusions
We now have the ability to create, store, and query fingerprints created from arbitrary SMILES strings. If there were a 1:1 relationship between molecules and fingerprints, we'd be nearly done. But things are not quite that simple. The next article in this series will show how to relate molecules to fingerprints.
Image Credit: adrenalin

