<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag openbabel</title>
    <link>http://depth-first.com/articles/tag/openbabel</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Fast Substructure Search Using Open Source Tools Part 4: Creating Fingerprints from Chemical Structures</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/adrenalin/4250667/"&gt;&lt;img src="http://depth-first.com/demo/20081015/falls.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The previous articles in this series have detailed the steps needed to build a working fingerprint screening system using nothing more than the open source tools &lt;a href="http://www.mysql.com/"&gt;MySQL&lt;/a&gt;, &lt;a href="http://ruby-lang.org"&gt;Ruby&lt;/a&gt;, and &lt;a href="http://ar.rubyonrails.org/"&gt;ActiveRecord&lt;/a&gt;. With this system we can create, read, update, and destroy fingerprints in persistent storage. Although the system meets all of the requirements of a fingerprint screening system, it isn't a substructure search system - yet. For that, we need a way to convert chemical structure representations into fingerprints. This article describes a very simple method for doing so.&lt;/p&gt;

&lt;p&gt;All Articles in this Series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases"&gt;Part 1: Fingerprints and Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql"&gt;Part 2: Fingerprint Screen With SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;Part 3: A CRUD API for Fingerprints in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 4: Creating Fingerprints from Chemical Structures&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/21/fast-substructure-search-using-open-source-tools-part-5-relating-molecules-to-fingerprints-with-sql"&gt;Part 5: Relating Molecules to Fingerprints with SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/29/fast-substructure-search-using-open-source-tools-part-6-modelling-a-one-to-many-relationship-between-fingerprints-and-compounds-in-ruby"&gt;Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;A Ruby Fingerprinter in Eight Lines&lt;/h4&gt;

&lt;p&gt;Let's create a &lt;tt&gt;Fingerprinter&lt;/tt&gt; class that's capable of converting a SMILES string into a &lt;tt&gt;Fingerprint&lt;/tt&gt; that can be stored and queried. The Ruby code below makes use of Open Babel's &lt;a href="http://openbabel.org/wiki/Babel"&gt;&lt;tt&gt;babel&lt;/tt&gt;&lt;/a&gt; command-line utility:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Fingerprinter&lt;/span&gt;  
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;fingerprint_smiles&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="ident"&gt;raw&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;%x[&lt;/span&gt;&lt;span class="string"&gt;echo '&lt;span class="expr"&gt;#{smiles}&lt;/span&gt;' | babel -ismi -ofpt 2&amp;gt;/dev/null&lt;/span&gt;&lt;span class="punct"&gt;]&lt;/span&gt;
    &lt;span class="ident"&gt;bytes&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;raw&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;gsub&lt;/span&gt;&lt;span class="punct"&gt;(/&lt;/span&gt;&lt;span class="regex"&gt;&amp;gt;.*?&lt;span class="escape"&gt;\n&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;/,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;').&lt;/span&gt;&lt;span class="ident"&gt;gsub&lt;/span&gt;&lt;span class="punct"&gt;(/&lt;/span&gt;&lt;span class="regex"&gt;&lt;span class="escape"&gt;\n&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;/,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;').&lt;/span&gt;&lt;span class="ident"&gt;split&lt;/span&gt;

    &lt;span class="constant"&gt;Fingerprint&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;fill_bytes&lt;/span&gt;&lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;i&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{bytes[2*i]}#{bytes[2*i+1]}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;.&lt;/span&gt;&lt;span class="ident"&gt;hex&lt;/span&gt;&lt;span class="punct"&gt;}&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This class takes advantage of Ruby's ability to interface directly with the command line through the &lt;tt&gt;%x&lt;/tt&gt; operator in a way similar to that previously described for the &lt;a href="http://depth-first.com/articles/2008/05/30/a-simple-and-portable-ruby-interface-to-inchi-part-2-silencing-console-output"&gt;cInChI command line tool&lt;/a&gt;. The &lt;tt&gt;babel&lt;/tt&gt; output is then converted into a form suitable for use with our &lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;previously-defined&lt;/a&gt; &lt;tt&gt;Fingerprint&lt;/tt&gt; class.&lt;/p&gt;

&lt;p&gt;Although quite easy to implement, this approach may not work in every situation. For example, the &lt;tt&gt;fingerprint_smiles&lt;/tt&gt; method opens the possibility that a malicious user could attempt to execute arbitrary shell commands by creating a mis-formed SMILES string. Windows users may need to adapt the code. But for trusted SMILES on Unix machines, this implementation works well and can be used in many different programming environments.&lt;/p&gt;

&lt;h4&gt;Testing the Fingerprinter&lt;/h4&gt;

We can test the Fingerprinter through interactive Ruby (irb):

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&amp;gt; require 'lib/fingerprinter'
=&amp;gt; true
irb(main):002:0&amp;gt; fp=Fingerprinter.new
=&amp;gt; #&amp;lt;Fingerprinter:0xb7498038&amp;gt;
irb(main):003:0&amp;gt; f=fp.fingerprint_smiles 'c1ccccc1'
=&amp;gt; #&amp;lt;Fingerprint id: nil, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: nil&amp;gt;
irb(main):004:0&amp;gt; f.cardinality
=&amp;gt; 6
irb(main):005:0&amp;gt; f.bitstring
=&amp;gt; "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000100000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As we previously saw, any &lt;tt&gt;Fingerprint&lt;/tt&gt; we create can be stored and later retrieved from a MySQL database. If we've already stored the fingerprint for benzene it can be found with the following:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&amp;gt; require 'lib/fingerprinter'
=&amp;gt; true
irb(main):002:0&amp;gt; fp=Fingerprinter.new
=&amp;gt; #&amp;lt;Fingerprinter:0xb74ae284&amp;gt;
irb(main):003:0&amp;gt; f=fp.fingerprint_smiles 'c1ccccc1'
=&amp;gt; #&amp;lt;Fingerprint id: nil, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: nil&amp;gt;
irb(main):004:0&amp;gt; Fingerprint.find_by_fingerprint f
=&amp;gt; #&amp;lt;Fingerprint id: 12687, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: "000000000000000000000000000002000000000000000000000..."&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;We now have the ability to create, store, and query fingerprints created from arbitrary SMILES strings. If there were a 1:1 relationship between molecules and fingerprints, we'd be nearly done. But things are not quite that simple. The next article in this series will show how to relate molecules to fingerprints.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/adrenalin/"&gt;adrenalin&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 15 Oct 2008 14:42:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:ad16d97d-9183-4e25-8b88-26a28ffdca48</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/15/fast-substructure-search-using-open-source-tools-part-4-creating-fingerprints-from-chemical-structures</link>
      <category>Tools</category>
      <category>ruby</category>
      <category>activerecord</category>
      <category>openbabel</category>
      <category>commandline</category>
      <category>fingerprint</category>
      <category>database</category>
      <category>substructuresearch</category>
      <category>query</category>
    </item>
    <item>
      <title>Fast Substructure Search Using Open Source Tools Part 2: Fingerprint Screen With SQL</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/dopeylok/14052025/"&gt;&lt;img src="http://depth-first.com/demo/20081003/skeleton.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The &lt;a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases"&gt;previous article in this series&lt;/a&gt; discussed the configuration of a MySQL database for fast substructure search with binary fingerprints. This article first shows how to populate this database with real fingerprint data for two molecules. Then it shows how to formulate standard SQL queries to screen the database for substructures.&lt;/p&gt;

&lt;p&gt;All articles in this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases"&gt;Part 1: Fingerprints and Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: Fingerprint Screen With SQL&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;Part 3: A CRUD API for Fingerprints in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/15/fast-substructure-search-using-open-source-tools-part-4-creating-fingerprints-from-chemical-structures"&gt;Part 4: Creating Fingerprints from Chemical Structures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/21/fast-substructure-search-using-open-source-tools-part-5-relating-molecules-to-fingerprints-with-sql"&gt;Part 5: Relating Molecules to Fingerprints with SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/29/fast-substructure-search-using-open-source-tools-part-6-modelling-a-one-to-many-relationship-between-fingerprints-and-compounds-in-ruby"&gt;Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Creating the Fingerprints with Open Babel&lt;/h4&gt;

&lt;p&gt;The &lt;tt&gt;babel&lt;/tt&gt; command line utility will, among it many conversions, return a fingerprint when given a valid SMILES string.  For example, we can create the fingerprint for benzene like this:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ babel -ismi -ofpt
c1ccccc1
&gt;   6 bits set 
00000000 00000000 00000000 00000200 00000000 00000000 
00000000 00000000 00000000 00000840 00000000 00008000 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 08000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00020000 
00000000 00000000 
1 molecule converted
12 audit log messages
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Similarly, we create the fingerprint for phenol like this:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ babel -ismi -ofpt
c1ccccc1O 
&gt;   12 bits set 
00000000 00000008 20000000 00000200 00000000 00000000 
02000000 00000000 00000000 00000840 00000000 00008000 
00000002 00000000 00000000 00000008 00000000 00000000 
00000000 00020000 00000000 08000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00020000 
00000000 00000000 
1 molecule converted
19 audit log messages
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The exact meaning of these fingerprints is interesting, but not relevant. Without getting into the details of the Open Babel fingerprint formats, which are discussed in detail &lt;a href="http://www.dalkescientific.com/writings/diary/archive/2008/06/27/generating_fingerprints_with_openbabel.html"&gt;elsewhere&lt;/a&gt;, the output contains the binary fingerprint of each molecule as an array of 32-bit hexadecimal numbers.&lt;/p&gt;

&lt;h4&gt;Adding Fingerprints to the Database&lt;/h4&gt;

&lt;p&gt;To use Open Babel's fingerprints with out database, we need to convert the 32-bit hexadecimal numerical output to 64-bit decimal format. This is not difficult and most programming environments make this very simple. For example, the following Ruby code will convert the third and fourth 32-bit hexadecimal numbers in the benzene fingerprint into a 64-bit decimal number:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; "0000000000000200".hex
=&gt; 512
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Performing this conversion for every pair of 32-bit hex numbers in each fingerprint gives a set of numbers we can place directly into our database:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; # Add benzene decimal fingerprint.
mysql&gt; insert into fingerprints
(fp0,fp1,fp2,fp3,fp4,fp5,fp6,fp7,fp8,fp9,fp10,fp11,fp12,fp13,fp14,fp15)
values
(0, 512, 0, 0, 2112, 32768, 0, 0, 0, 0, 134217728, 0, 0, 0, 131072, 0);
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Similarly,&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; # Add phenol decimal fingerprint.
mysql&gt; insert into fingerprints
(fp0,fp1,fp2,fp3,fp4,fp5,fp6,fp7,fp8,fp9,fp10,fp11,fp12,fp13,fp14,fp15)
values
(8, 2305843009213694464, 0, 144115188075855872, 2112, 32768, 8589934592, 8, 0, 131072, 134217728, 0, 0, 0, 131072, 0);
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Our table is now ready to be queried:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; select * from fingerprints;
+----+------+---------------------+------+--------------------+------+-------+------------+------+------+--------+-----------+------+------+------+--------+------+
| id | fp0  | fp1                 | fp2  | fp3                | fp4  | fp5   | fp6        | fp7  | fp8  | fp9    | fp10      | fp11 | fp12 | fp13 | fp14   | fp15 |


+----+------+---------------------+------+--------------------+------+-------+------------+------+------+--------+-----------+------+------+------+--------+------+
|  1 |    0 |                 512 |    0 |                  0 | 2112 | 32768 |          0 |    0 |    0 |      0 | 134217728 |    0 |    0 |    0 | 131072 |    0 | 
|  2 |    8 | 2305843009213694464 |    0 | 144115188075855872 | 2112 | 32768 | 8589934592 |    8 |    0 | 131072 | 134217728 |    0 |    0 |    0 | 131072 |    0 | 
+----+------+---------------------+------+--------------------+------+-------+------------+------+------+--------+-----------+------+------+------+--------+------+
2 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Querying the Database&lt;/h4&gt;

&lt;p&gt;With a table of fingerprints in hand, we can begin formulating queries. To do so, we'll use MySQL's built-in support for &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/bit-functions.html"&gt;binary arithmetic&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A molecule with fingerprint A can represent a substructure of another molecule with fingerprint B if all of the bits in B are also present in A. Mathematically, we'd say that:&lt;/p&gt;

&lt;p&gt;B&lt;sub&gt;i&lt;/sub&gt; &amp;amp; A&lt;sub&gt;i&lt;/sub&gt; = B&lt;sub&gt;i&lt;/sub&gt;&lt;/p&gt;

&lt;p&gt;for all bits i in A and B.&lt;/p&gt;

&lt;p&gt;Let's say we have a two-bit fingerprint consisting of 01 and 11 (binary) in our database. We can use MySQL to test whether the molecule from which the second fingerprint was derived could be a substructure of the molecule from which the first fingerprint was derived with this syntax:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; select 1&amp;3;
+-----+
| 1&amp;3 |
+-----+
|   1 | 
+-----+
1 row in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The answer is yes, there could be a substructure match because 1&amp;amp;3 = 1.&lt;/p&gt;

&lt;p&gt;We're now ready to perform our first substructure screen using SQL. This consists of selecting all rows for which each of the 16 fingerprint components, when anded together with a query fingerprint component, gives back the original component.&lt;/p&gt;

&lt;p&gt;To see if phenol is a substructure of benzene, we could use the following:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; select id from fingerprints where fp0&amp;0=0 and fp1&amp;512=512 and fp2&amp;0=0 and fp3&amp;0=0 and fp4&amp;2112=2112 and fp5&amp;32768=32768 and fp6&amp;0=0 and fp7&amp;0=0 and fp8&amp;0=0 and fp9&amp;0=0 and fp10&amp;134217728=134217728 and fp11&amp;0=0 and fp12&amp;0=0 and fp13&amp;0=0 and fp14&amp;131072=131072 and fp15&amp;0=0;
+----+
| id |
+----+
|  1 | 
|  2 | 
+----+
2 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Our query results are telling us that phenol is both a substructure of benzene and itself, as expected.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;We now have a database populated with two molecules represented as fingerprints. We can even scan the database for possible substructure matches using nothing more than standard SQL queries. Nevertheless, we've had to use a lot of manual coding to convert hex into decimal and create SQL. We need a library to do this mundane work for us. The next article in this series will discuss a better approach using Ruby.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/dopeylok/"&gt;dopeylok&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Fri, 03 Oct 2008 14:47:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:fb61a84a-c88d-4ec7-9b0b-642f45092eb4</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql</link>
      <category>Tools</category>
      <category>openbabel</category>
      <category>mysql</category>
      <category>ruby</category>
      <category>fingerprint</category>
      <category>substructuresearch</category>
      <category>query</category>
    </item>
    <item>
      <title>Fast Substructure Search Using Open Source Tools Part 1: Fingerprints and Databases</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/jaded/89717778/"&gt;&lt;img src="http://depth-first.com/demo/20081002/fingerprint.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;For anyone working in a chemistry-related job, chemical databases are ubiquitous. A printed list of IUPAC names, a spreadsheet containing &lt;a href="http://depth-first.com/articles/2008/05/26/simple-cas-number-lookup-and-more-with-chempedia"&gt;CAS numbers&lt;/a&gt;, and a set of hand-drawn structures on index cards are all primitive chemical databases. They aren't nearly as useful as they could be to either the creator or his/her collaborators, but they are databases nevertheless. Anyone who has spent time in industry or academics knows that these low-tech chemical databases are everywhere. And they become more of a problem as more information is moved into electronic format.&lt;/p&gt;

&lt;p&gt;All articles in this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1: Fingerprints and Databases&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql"&gt;Part 2: Fingerprint Screen With SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;Part 3: A CRUD API for Fingerprints in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/15/fast-substructure-search-using-open-source-tools-part-4-creating-fingerprints-from-chemical-structures"&gt;Part 4: Creating Fingerprints from Chemical Structures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/21/fast-substructure-search-using-open-source-tools-part-5-relating-molecules-to-fingerprints-with-sql"&gt;Part 5: Relating Molecules to Fingerprints with SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/29/fast-substructure-search-using-open-source-tools-part-6-modelling-a-one-to-many-relationship-between-fingerprints-and-compounds-in-ruby"&gt;Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;The Problem: Structure Search is Hard&lt;/h4&gt;

&lt;p&gt;Many of the low-tech chemical databases that professional chemists routinely share and work with would become orders of magnitude more useful if they were converted into substructure-searchable databases and published to the Web. Although there has been a &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;great deal of effort toward this end&lt;/a&gt; in the last few years, there's still much, much more that could be done.&lt;/p&gt;

&lt;p&gt;One of the main problems in creating a substructure-searchable chemical database is implementing the substructure search capability itself. This one requirement has done more to stifle the free flow of chemical information than perhaps any other. Solving the problem appears very difficult on first or second glance, and it is very difficult if you don't have the right tools. Many companies offer solutions - but at a price, both in terms of money and time, that is simply out of reach.&lt;/p&gt;

&lt;p&gt;What can you do if you're just getting started with modest requirements and budget?&lt;/p&gt;

&lt;h4&gt;About This Series&lt;/h4&gt;

&lt;p&gt;This article, the first in a series, will describe the creation of a chemical substructure search engine using exclusively well-maintained and robust open source tools: &lt;a href="http://openbabel.org"&gt;Open Babel&lt;/a&gt; for generating fingerprints and peforming atom-by-atom searches; &lt;a href="http://mysql.com"&gt;MySQL&lt;/a&gt; as a relational database; and &lt;a href="http://ruby-lang.org"&gt;Ruby&lt;/a&gt; as a scripting language.&lt;/p&gt;

&lt;p&gt;Each of these three components is a commodity that can be replaced with any one of a number of open-source or proprietary substitutes, maximizing flexibility and minimizing vendor lock-in.&lt;/p&gt;

&lt;h4&gt;Other Resources&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://merian.pch.univie.ac.at/pch/nh_info.html"&gt;Norbert Haider&lt;/a&gt; of the University of Vienna has written a very useful tutorial on &lt;a href="http://merian.pch.univie.ac.at/~nhaider/cheminf/moldb.html"&gt;creating a structure-searchable database using free tools&lt;/a&gt;, which is part of a &lt;a href="http://depth-first.com/articles/2007/04/13/roll-your-own-chemical-database-with-free-components"&gt;larger series&lt;/a&gt;. That series differs from this one in the technology stack used and the level of detail to be provided. The series of articles to appear here will spell out the low-level series of steps needed to create a working substructure search system. It's hoped that taking this perspective makes clear the steps needed to apply the approach to alternative technology platforms.&lt;/p&gt;

&lt;h4&gt;Binary Fingerprints and Relational Databases&lt;/h4&gt;

&lt;p&gt;At the heart of the system we'll build is the chemical fingerprint which is a (usually) lossy binary representation of a chemical structure. Creating a binary fingerprint is like putting every chemical structure, known or unknown into just one bin out of a very large, but finite set of bins. Although the same molecule is guaranteed to always go into the same bin, more than one molecule can be placed into each bin. This is a general feature of all &lt;a href="http://en.wikipedia.org/wiki/Hash_function"&gt;hashing&lt;/a&gt; schemes.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.dalkescientific.com/index.html"&gt;Andrew Dalke&lt;/a&gt; has written &lt;a href="http://www.dalkescientific.com/writings/diary/archive/2008/06/26/fingerprint_background.html"&gt;an excellent series of articles&lt;/a&gt; on fingerprints and what can be done with them. Another good overview is &lt;a href="http://www.daylight.com/dayhtml/doc/theory/theory.finger.html"&gt;available from Daylight&lt;/a&gt;. This article will assume you know what fingerprints are and how they can be used to compare chemical structures.&lt;/p&gt;

&lt;p&gt;The problem with binary fingerprints is that they are generally several hundred bits long - too long to be represented in a form that allows direct and rapid query by a relational database system. They need to be broken up - but how?&lt;/p&gt;

&lt;p&gt;A widely-used approach (and the one that will be taken here) involves breaking up the fingerprint into a series of integers that are stored in the database.&lt;/p&gt;

&lt;p&gt;For example, let's say we have a 1024-bit fingerprint. We could represent this as a number from 0 to 2^1024, which of course is way to big for most computers to handle today. We could, however, represent this fingerprint as a series of sixteen 64-bit integers (which are available on most systems).&lt;/p&gt;

&lt;p&gt;So, the binary fingerprint:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
1111111101111111110110111011011000101000011000011010011100010000
1001100010101101000110100010110011101100100000100100000111010100
0101010000101011001010011001000100011001100000101100111010001110
1001000101001010000001011001100101101011111111011000111100000111
1010101100100101000100001100011001010111001001110101101100010010
0011101011101110110011111010000010111001100101001001101010110001
1100111000010100000100110111101001011100010111010001010101101101
0010001111111010111011110110000000001010111011111001111001111101
0101011100011111110111011110011110100110010110010101011001011111
0110100001111001101111011101001101101001000100010001100101111000
0011111001000100001111111110001100111001101000000100010010010110
0000011101001001011000111110101110010101110001111010100001100100
0100100111101010110101101010110110101010110110111011011001111111
0011100100101101101001000001000111110101011101110101101001101001
0110100100111001111001001111110111111001110100100110010100011110
0010101100101000011110101110111011001110101111100001011010101100
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;could also be represented as this decimal fingerprint (assuming your machine is &lt;a href="http://en.wikipedia.org/wiki/Endianness"&gt;big-endian&lt;/a&gt;):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
18410675377121896208
11001478244984832468
6064987026359504526
10469186440276053767
12332281598675737362
4246559787872197297
14849515287603909997
2592647731284516477
6277980392575817311
7528256967824972152
4486781373924787350
525060695046727780
5326305550703244927
4120129631153511017
7582343227124114718
3109870708788696748
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We can easily store this set of 16 numbers in a relational database table. For example, if we had a MySQL database called "compounds", we could create a "fingerprints" table:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; create database compounds;
Query OK, 1 row affected (0.02 sec)

mysql&gt; use compounds;

Database changed
mysql&gt; create table fingerprints(id int not null auto_increment, primary key(id), fp0 bigint(64), fp1 bigint(64), fp2 bigint(64), fp3 bigint(64), fp4 bigint(64), fp5 bigint(64), fp6 bigint(64), fp7 bigint(64), fp8 bigint(64), fp9 bigint(64), fp10 bigint(64), fp11 bigint(64), fp12 bigint(64), fp13 bigint(64), fp14 bigint(64), fp15 bigint(64));
Query OK, 0 rows affected (0.01 sec)

mysql&gt; describe fingerprints;
+-------+------------+------+-----+---------+----------------+
| Field | Type       | Null | Key | Default | Extra          |
+-------+------------+------+-----+---------+----------------+
| id    | int(11)    | NO   | PRI | NULL    | auto_increment | 
| fp0   | bigint(64) | YES  |     | NULL    |                | 
| fp1   | bigint(64) | YES  |     | NULL    |                | 
| fp2   | bigint(64) | YES  |     | NULL    |                | 
| fp3   | bigint(64) | YES  |     | NULL    |                | 
| fp4   | bigint(64) | YES  |     | NULL    |                | 
| fp5   | bigint(64) | YES  |     | NULL    |                | 
| fp6   | bigint(64) | YES  |     | NULL    |                | 
| fp7   | bigint(64) | YES  |     | NULL    |                | 
| fp8   | bigint(64) | YES  |     | NULL    |                | 
| fp9   | bigint(64) | YES  |     | NULL    |                | 
| fp10  | bigint(64) | YES  |     | NULL    |                | 
| fp11  | bigint(64) | YES  |     | NULL    |                | 
| fp12  | bigint(64) | YES  |     | NULL    |                | 
| fp13  | bigint(64) | YES  |     | NULL    |                | 
| fp14  | bigint(64) | YES  |     | NULL    |                | 
| fp15  | bigint(64) | YES  |     | NULL    |                | 
+-------+------------+------+-----+---------+----------------+
17 rows in set (0.01 sec)

&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Although we have neither a substructure search engine nor a database, we've laid a solid foundation for those things. The next article in this series will show how to use this humble beginning to model some simple substructure queries in a way that lets MySQL do most of the heavy-lifting.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/jaded/"&gt;Mr. Jaded&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Thu, 02 Oct 2008 23:27:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:1d0f8fdc-4664-46ac-89cc-7c1de0608edd</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases</link>
      <category>Tools</category>
      <category>mysql</category>
      <category>ruby</category>
      <category>openbabel</category>
      <category>database</category>
      <category>fingerprint</category>
      <category>substructuresearch</category>
      <category>substructure</category>
    </item>
    <item>
      <title>Recombining Compressed PubChem SD Files with Open Babel</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.org"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;While testing &lt;a href="http://metamolecular.com/chemphoto"&gt;ChemPhoto&lt;/a&gt;, it became necessary to test the &lt;a href="http://depth-first.com/articles/2008/09/08/smarter-cheminformatics-from-sd-file-to-image-collection-with-chemphoto"&gt;chemical structure imaging application&lt;/a&gt; with SD Files containing several hundred thousand records. Although it's tempting to meet this need by constructing "dummy" files with the same record or small set of records repeated, tests are always far more illuminating when real data is used.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; is an excellent source of large molecular datasets, and the entire database can be &lt;a href="http://depth-first.com/articles/2006/09/29/hacking-pubchem-direct-access-with-ftp"&gt;downloaded by FTP&lt;/a&gt;. Because of PubChem's massive size, what's downloadable consists of files broken up into groups of about 25,000 in gzipped SD File format (*.sdf.gz). Although this is an excellent resource, it creates a problem: how can you conveniently recombine this set of compressed SD Files into a single SD File?&lt;/p&gt;

&lt;p&gt;You might think about writing some "quick" code in your language of choice. Fortunately, &lt;a href="http://openbabel.org"&gt;Open Babel&lt;/a&gt; gets the job done - without any of the coding or debugging.&lt;/p&gt;

&lt;p&gt;The following command will create a single SD File from all of the compressed SD Files in a given directory, while also stripping explicit hydrogens and removing all fields except PUBCHEM_COMPOUND_CID.&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
babel *.sdf.gz pubchem.sdf -d --delete PUBCHEM_COMPOUND_CANONICALIZED,PUBCHEM_CACTVS_COMPLEXITY,PUBCHEM_CACTVS_HBOND_ACCEPTOR,PUBCHEM_CACTVS_HBOND_DONOR,PUBCHEM_CACTVS_ROTATABLE_BOND,PUBCHEM_CACTVS_SUBSKEYS,PUBCHEM_IUPAC_OPENEYE_NAME,PUBCHEM_IUPAC_CAS_NAME,PUBCHEM_IUPAC_NAME,PUBCHEM_IUPAC_SYSTEMATIC_NAME,PUBCHEM_IUPAC_TRADITIONAL_NAME,PUBCHEM_NIST_INCHI,PUBCHEM_EXACT_MASS,PUBCHEM_MOLECULAR_FORMULA,PUBCHEM_MOLECULAR_WEIGHT,PUBCHEM_OPENEYE_CAN_SMILES,PUBCHEM_OPENEYE_ISO_SMILES,PUBCHEM_CACTVS_TPSA,PUBCHEM_MONOISOTOPIC_WEIGHT,PUBCHEM_TOTAL_CHARGE,PUBCHEM_HEAVY_ATOM_COUNT,PUBCHEM_ATOM_DEF_STEREO_COUNT,PUBCHEM_ATOM_UDEF_STEREO_COUNT,PUBCHEM_BOND_DEF_STEREO_COUNT,PUBCHEM_BOND_UDEF_STEREO_COUNT,PUBCHEM_ISOTOPIC_ATOM_COUNT,PUBCHEM_COMPONENT_COUNT,PUBCHEM_CACTVS_TAUTO_COUNT,PUBCHEM_BONDANNOTATIONS,PUBCHEM_CACTVS_XLOGP

865543 molecules converted
7 info messages 15372962 audit log messages 
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Apparently, there is no way to tell babel to &lt;em&gt;keep&lt;/em&gt; just a particular field in an SD File - they need to be removed individually.&lt;/p&gt;

&lt;p&gt;Still, not bad for a few seconds on the command line.&lt;/p&gt;</description>
      <pubDate>Wed, 01 Oct 2008 01:25:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:725a5f70-77e1-4aee-a79d-e7fb9f7c3401</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/01/recombining-compressed-pubchem-sd-files-with-open-babel</link>
      <category>Tools</category>
      <category>openbabel</category>
      <category>sdfile</category>
      <category>pubchem</category>
      <category>sdfgz</category>
      <category>commandline</category>
    </item>
    <item>
      <title>Install Open Babel Into Your Home Directory: You Don't Need Root</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.org/"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Occasionally you may want to install &lt;a href="http://openbabel.org"&gt;Open Babel&lt;/a&gt; into your home directory, or some other non-root directory. This problem most commonly crops up when using a server on which you don't have root privileges. Another reason could be that you're practicing or experimenting, and just want to keep the mess out of your system directories. The following tip shows how to install Open Babel to an arbitrary directory on your filesystem.&lt;/p&gt;

&lt;p&gt;Let's say you've created a directory under your home directory called "local" to host your personal binaries and libraries. Let's also say that this directory contains a subdirectory called, surprisingly enough, "openbabel."&lt;/p&gt;

&lt;p&gt;This series of commands will install Open Babel to "$HOME/local/openbabel":&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ ./configure --prefix=$HOME/local/openbabel --exec-prefix=$HOME/local/openbabel
[lots of output]
$ make
[even more output]
$ make install
[look ma, no sudo]
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Aside: if you're new to compiling C++ code and are running Ubuntu, this command will install everything you'll need to build Open Babel:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ sudo aptitude install build-essential
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We can then run Open Babel:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ $HOME/local/openbabel/bin/babel -ismi -oinchi
c1ccccc1
InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H
1 molecule converted
6 audit log messages
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;If you want to do further development work or scripting, the above may not be ideal. But to get Open Babel running in situations in which you can't or prefer not to use root, this approach does the trick.&lt;/p&gt;</description>
      <pubDate>Mon, 29 Sep 2008 15:51:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:ffa6e79d-ccd4-4637-98c2-689f25791a87</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/09/29/install-open-babel-into-your-home-directory-you-dont-need-root</link>
      <category>Tools</category>
      <category>openbabel</category>
      <category>root</category>
      <category>build</category>
      <category>ubuntu</category>
      <category>linux</category>
      <category>homedirectory</category>
    </item>
    <item>
      <title>Open Babel 2.2.0</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.org"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://openbabel.org"&gt;Open Babel&lt;/a&gt; 2.2.0 has been &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=40728&amp;amp;package_id=32894"&gt;released&lt;/a&gt;. This version introduces a variety of new features and improvements. It also includes the &lt;a href="http://depth-first.com/articles/2007/04/09/painless-installation-of-ruby-open-babel"&gt;Ruby Open Babel&lt;/a&gt; interface that allows scripting through the popular &lt;a href="http://ruby-lang.org"&gt;Ruby language&lt;/a&gt;; Ruby Open Babel can be &lt;a href="http://depth-first.com/articles/2007/04/09/painless-installation-of-ruby-open-babel"&gt;installed both quickly and easily&lt;/a&gt;. Further details are available from the &lt;a href="http://openbabel.org/wiki/Open_Babel_2.2.0"&gt;release notes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Future articles will highlight some of the new Open Babel features using Ruby.&lt;/p&gt;</description>
      <pubDate>Fri, 04 Jul 2008 11:29:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:c4a2fcb5-29a7-496c-b9f0-529b84997d04</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/07/04/open-babel-2-2-0</link>
      <category>Tools</category>
      <category>ruby</category>
      <category>rubyopenbabel</category>
      <category>openbabel</category>
    </item>
    <item>
      <title>Run Babel Anywhere Java Runs with JBabel</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.sf.net"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;A &lt;a href="http://depth-first.com/articles/tag/nestedvm"&gt;recent series of D-F articles&lt;/a&gt; have discussed the use of &lt;a href="http://nestedvm.ibex.org/"&gt;NestedVM&lt;/a&gt; to compile cheminformatics programs written in C/C++ to pure java binaries that can be run on any system with a JVM. More specifically, an attempt to compile &lt;a href="http://openbabel.sf.net"&gt;OpenBabel's&lt;/a&gt; &lt;tt&gt;babel&lt;/tt&gt; program to bytecode was only &lt;a href="http://depth-first.com/articles/2007/11/26/compiling-open-babel-to-pure-java-bytecode-with-nestedvm-building-a-runnable-classfile-that-almost-works"&gt;partially successful&lt;/a&gt;. With the &lt;a href="http://sourceforge.net/mailarchive/forum.php?thread_name=819391.60947.qm%40web34201.mail.mud.yahoo.com&amp;amp;forum_name=openbabel-discuss"&gt;help of Geoff Hutchison&lt;/a&gt;, the problem was resolved. This article introduces JBabel, a platform-independent, pure Java implementation of OpenBabel's &lt;tt&gt;babel&lt;/tt&gt; program.&lt;/p&gt;

&lt;h4&gt;A Little About JBabel&lt;/h4&gt;

&lt;p&gt;JBabel was compiled from the &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=40728&amp;amp;package_id=32894&amp;amp;release_id=521581"&gt;Open Babel 2.1.1 source release&lt;/a&gt; and can be &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=144794&amp;amp;package_id=255103"&gt;downloaded from SourceForge&lt;/a&gt;. The same jarfile was successfully tested on Linux, Windows and Mac OS X. You can verify JBabel works on your platform with the following command:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -Hsmi
smi  SMILES format
A linear text format which can describe the connectivity
and chirality of a molecule
Write Options e.g. -xt
  n no molecule name
  t molecule name only
  r radicals lower case eg ethyl is Cc
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This version of JBabel was compiled with support for three formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SMILES (smi). Non-canonical SMILES.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MDL (mol). Molfiles and SD Files.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Canonical SMILES (can). Canonical SMILES implementation &lt;a href="http://depth-first.com/articles/2006/11/06/stone-soup"&gt;donated by eMolecules&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll discuss exactly how support for these formats was added in a subsequent post. More formats will be added in the future. For now, let's just try JBabel out.&lt;/p&gt;

&lt;h4&gt;Testing JBabel&lt;/h4&gt;

&lt;p&gt;One way to use JBabel is interactively from the command line - just leave out an input or output file parameter. For example, if you wanted to get the eMolecules canonical SMILES for &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=68617"&gt;sertraline&lt;/a&gt;, you might do something like this (be sure to use two returns to begin processing):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -ismi -ocan
CN[C@H]1CC[C@H](C2=CC=CC=C12)C3=CC(=C(C=C3)Cl)Cl

CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12
1 molecule converted
34 audit log messages
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This canonical SMILES can be converted into a molfile with the following:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -ismi -omol
CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12


 OpenBabel12090723182D

 22 24  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0

...
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;To convert using input and output files, we could use a medium-sized dataset such as the &lt;a href="http://rubyforge.org/frs/download.php/27768/pubchem_benzodiazepine_20071110.sdf.gz"&gt;PubChem benzodiazepine dataset&lt;/a&gt; prepared for &lt;a href="http://rbtk.rubyforge.org/"&gt;Rubidium&lt;/a&gt;:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -imol pubchem_benzodiazepine_20071110.sdf -ocan pubchem_benzodiazepine_20071110.smi
==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problems reading a MDL file
Cannot read title line

2117 molecules converted
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This test, which parses 2117 records, required four minutes forty-five seconds on my system. For comparison, the natively compiled binary did the same thing in about thirteen seconds. Clearly, the JBabel performance hit is substantial.&lt;/p&gt;

&lt;h4&gt;Uses&lt;/h4&gt;

&lt;p&gt;Although it's very unlikely that JBabel will ever be useful in performance-critical situations, its portability makes it attractive for other uses. Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;application development in heterogeneous computing environments;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;use on systems in which native compilation may be difficult, such as those with unusual configurations or operating systems;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cases in which native binaries work poorly or not at all, such as in applets and Java applications;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;situations in which performance is a minor consideration, such as in end-user applications that process only a few molecules at a time, or during application prototyping&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;This article has described JBabel, the first portable binary version of OpenBabel's &lt;tt&gt;babel&lt;/tt&gt; molecular file format interconversion program. The next article in this series will describe in detail the steps that were used to compile it.&lt;/p&gt;</description>
      <pubDate>Mon, 10 Dec 2007 08:50:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:5d98a980-e3d6-4afd-8eb3-25769a28d13b</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/12/10/run-babel-anywhere-java-runs-with-jbabel</link>
      <category>Tools</category>
      <category>jbabel</category>
      <category>babel</category>
      <category>openbabel</category>
      <category>nestedvm</category>
      <category>molfile</category>
      <category>canonicalsmiles</category>
      <category>smiles</category>
    </item>
    <item>
      <title>Compiling Open Babel to Pure Java Bytecode with NestedVM: Building A Runnable Classfile that Almost Works</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.sf.net"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Previously, I described an &lt;a href="http://depth-first.com/articles/2007/11/19/compiling-open-babel-to-pure-java-bytecode-with-nestedvm-an-unsuccessful-first-attempt#comment-268"&gt;unsuccessful first attempt&lt;/a&gt; to compile the popular cheminformatics C/C++ library &lt;a href="http://openbabel.sf.net"&gt;Open Babel&lt;/a&gt; to pure Java bytecode using &lt;a href="http://nestedvm.ibex.org/"&gt;NestedVM&lt;/a&gt;. This article follows that topic one step further, and shows how to obtain a runnable Java classfile. Although major functionality is missing, the principle of compiling arbitrary C/C++ code to both Java source code and Java bytecode is illustrated.&lt;/p&gt;

&lt;h4&gt;Getting Started&lt;/h4&gt;

&lt;p&gt;This articles assumes that you've &lt;a href="http://depth-first.com/articles/2007/11/19/compiling-open-babel-to-pure-java-bytecode-with-nestedvm-an-unsuccessful-first-attempt#comment-268"&gt;installed NestedVM and downloaded Open Babel&lt;/a&gt; on your system. You'll then need to set up your environment (from the nestedvm installation directory):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ source env.sh
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Run the Configure Script&lt;/h4&gt;

&lt;p&gt;The configure script we used last time didn't attempt to statically compile the binary utilities in the &lt;strong&gt;tools&lt;/strong&gt; directory. This time, we'll add flags to allow this:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ ./configure --disable-dynamic-modules --enable-static=yes --enable-shared=no --enable-inchi --host=mips-unknown-elf
$ make
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Note: leaving out the static compile directives does not produce a fully-functioning classfile either.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Next, we'll attempt to directly create the &lt;tt&gt;babel&lt;/tt&gt; binary in Java classfile format, as we did last time:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ cd tools
$ java org.ibex.nestedvm.Compiler -outfile Babel.class Babel babel
Exception in thread "main" java.lang.IllegalStateException: unresolved phantom target
        at org.ibex.classgen.MethodGen.resolveTarget(MethodGen.java:555)
        at org.ibex.classgen.MethodGen._generateCode(MethodGen.java:664)
        at org.ibex.classgen.MethodGen.generateCode(MethodGen.java:618)
        at org.ibex.classgen.MethodGen.dump(MethodGen.java:888)
        at org.ibex.classgen.ClassFile._dump(ClassFile.java:193)
        at org.ibex.classgen.ClassFile.dump(ClassFile.java:160)
        at org.ibex.nestedvm.ClassFileCompiler.__go(ClassFileCompiler.java:380)
        at org.ibex.nestedvm.ClassFileCompiler._go(ClassFileCompiler.java:72)
        at org.ibex.nestedvm.Compiler.go(Compiler.java:259)
        at org.ibex.nestedvm.Compiler.main(Compiler.java:183)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We're getting the same error as before. Although, an &lt;a href="http://groups.google.com/group/nestedvm/browse_thread/thread/b5d114a20a6b672b"&gt;announcement of a bugfix&lt;/a&gt; was posted to the NestedVM list, in my hands the new version of NestedVM caused the same error.&lt;/p&gt;

&lt;p&gt;As a workaround, we can compile to Java sourcecode first:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java org.ibex.nestedvm.Compiler -outformat java -outfile Babel.java Babel babel
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We now have a Java source file encoding the &lt;strong&gt;babel&lt;/strong&gt; program. Does it compile?&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ javac Babel.java
The system is out of resources.
Consult the following stack trace for details.
java.lang.OutOfMemoryError: Java heap space
        at com.sun.tools.javac.util.Position$LineMapImpl.build(Position.java:139)
        at com.sun.tools.javac.util.Position.makeLineMap(Position.java:63)
        at com.sun.tools.javac.parser.Scanner.getLineMap(Scanner.java:1105)
        at com.sun.tools.javac.main.JavaCompiler.parse(JavaCompiler.java:512)
        at com.sun.tools.javac.main.JavaCompiler.parse(JavaCompiler.java:550)
        at com.sun.tools.javac.main.JavaCompiler.parseFiles(JavaCompiler.java:801)
        at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:727)
        at com.sun.tools.javac.main.Main.compile(Main.java:353)
        at com.sun.tools.javac.main.Main.compile(Main.java:279)
        at com.sun.tools.javac.main.Main.compile(Main.java:270)
        at com.sun.tools.javac.Main.compile(Main.java:69)
        at com.sun.tools.javac.Main.main(Main.java:54)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Not exactly. But this is a massive source file, so we'll need to increase the Java compiler's memory allowance:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ javac Babel.java -J-Xms256m -J-Xmx256m
Note: Babel.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This seems to have worked. Can we run the classfile?&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java Babel -H
Open Babel converts chemical structures from one file format to another

Usage: Babel &amp;lt;input spe&amp;gt; &amp;lt;output spec&amp;gt; [Options]

Each spec can be a file whose extension decides the format.
Optionally the format can be specified by preceding the file by
-i&amp;lt;format-type&amp;gt; e.g. -icml, for input and -o&lt;format-type&gt; for output

--truncated--
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Success! But before we get too excited, let's make sure Open Babel's file formats are recognized by testing for "SMILES":&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java Babel -Hsmi
Format type: smi was not recognized
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As you can see, we have successfully converted the &lt;tt&gt;babel&lt;/tt&gt; program to an executable classfile, but this classfile is missing most of the features of the native binary.&lt;/p&gt;

&lt;p&gt;This may seem hopeless, but consider that natively compiling Open Babel using the above &lt;tt&gt;configure&lt;/tt&gt; flags also produces a binary that doesn't know about SMILES or any other format.&lt;/p&gt;

&lt;p&gt;So, it's very likely that if we can produce a native, statically compiled, self contained &lt;tt&gt;babel&lt;/tt&gt; executable, then we will have solved the problem of running Open Babel entirely on a JVM.&lt;/p&gt;

&lt;p&gt;This doesn't seem like a difficult problem, &lt;a href="http://sourceforge.net/mailarchive/forum.php?thread_name=819391.60947.qm%40web34201.mail.mud.yahoo.com&amp;amp;forum_name=openbabel-discuss"&gt;but apparently it is&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Mon, 26 Nov 2007 10:10:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:24f162e9-dafd-4f32-8458-0dc45acb5345</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/26/compiling-open-babel-to-pure-java-bytecode-with-nestedvm-building-a-runnable-classfile-that-almost-works</link>
      <category>Tools</category>
      <category>openbabel</category>
      <category>nestedvm</category>
      <category>java</category>
      <category>crosscompile</category>
      <category>bytecode</category>
    </item>
    <item>
      <title>Compiling Open Babel to Pure Java Bytecode with NestedVM: An Unsuccessful First Attempt</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.sf.net/"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Wouldn't it be great to be able to compile code written in languages like FORTRAN, C, and C++ to Java bytecode? &lt;a href="http://nestedvm.ibex.org/"&gt;NestedVM&lt;/a&gt; - almost magically - can do just that. This article documents a failed first attempt to compile the popular cheminformatics toolkit &lt;a href="http://openbabel.sf.net"&gt;Open Babel&lt;/a&gt;, which is written in C and C++, to pure Java bytecode with NestedVM.&lt;/p&gt;

&lt;p&gt;A previous article described the &lt;a href="http://depth-first.com/articles/2007/10/31/jinchi-run-inchi-anywhere-java-runs"&gt;successful compilation of the InChI toolkit&lt;/a&gt;, a C library, to a platform-independent executable jarfile.&lt;/p&gt;

&lt;h4&gt;The Problem&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://openbabel.sf.net"&gt;Open Babel&lt;/a&gt; is one of cheminformatics' most &lt;a href="http://sourceforge.net/project/stats/?group_id=40728&amp;amp;ugn=openbabel"&gt;widely-used&lt;/a&gt; open source packages. It interconverts dozens of molecular languages, performs a host of cheminformatics analyses, and serves as a platform for many programs and Web services.&lt;/p&gt;

&lt;p&gt;As useful as Open Babel is, it doesn't run directly on a Java Virtual Machine (JVM). Although an &lt;a href="http://openbabel.sourceforge.net/wiki/Java"&gt;Open Babel JNI&lt;/a&gt; interface does exist, using it introduces a platform dependency, which in many cases is not acceptable. JNI is a great solution in some cases, but when maintaining a single version of a program is important, or when applets need to be used, or when code needs to work with unusual system configurations, it's a poor choice.&lt;/p&gt;

&lt;p&gt;Our goal is to compile Open Babel's "babel" command-line utility into pure Java bytecode that can be run on any recent JVM without using JNI.&lt;/p&gt;

&lt;h4&gt;Overview of NestedVM&lt;/h4&gt;

&lt;p&gt;In a nutshell, NestedVM converts MIPS binaries to Java class files. In theory, this allows software written in any language that can be compiled to a MIPS binary to be run on a JVM.&lt;/p&gt;

&lt;p&gt;To do this, NestedVM distributes two categories of tools: (1) a complete MIPS cross-compiler toolchain; and (2) a MIPS binary to Java bytecode compiler and accessories.&lt;/p&gt;

&lt;h4&gt;Building NestedVM&lt;/h4&gt;

&lt;p&gt;The preferred method to install NestedVM is to compile it from source found in the project repository. There are a number of prerequisites your system must meet in order to be able to do so. For now, this article assumes your system has all of them. Some of the following steps can be found in &lt;a href="http://wiki.brianweb.net/NestedVM/QuickStartGuide"&gt;these instructions&lt;/a&gt; as well.&lt;/p&gt;

&lt;p&gt;To obtain the source code from the NestedVM darcs repository:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ darcs get --repo-name=nestedvm http://nestedvm.ibex.org
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Then change into the &lt;strong&gt;nestedvm&lt;/strong&gt; directory and build the main code:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ cd nestedvm
$ make
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;On my machine, this step takes 10-15 minutes.&lt;/p&gt;

&lt;p&gt;To make sure your build works, run the tests:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ make test
...
1.574000e+00
-4.315000e+01l
-43
-4.315000e+01
4.315000e+01
Hello, World
7F
fabs(-2.24) = 2.34
Destructor!
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;NestedVM doesn't build the g++ compiler by default - it's something that needs to be done manually. Fortunately, it's not difficult to do:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ make cxxtest
...
java -cp build tests.CXXTest
Test's constructor
Name: 0x50b40
Name: PKc
Is pointer: 1
Name: 0x50b3c
Name: i
Is pointer: 0
Hello, World from Test
Now throwing an exception
sayhi threw: const char *:Hello, Exception Handling!
Test's destructor
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Finally, with all tools built, we need to set up our environment:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ make env.sh
$ source env.sh
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We're now ready to cross-compile Open Babel.&lt;/p&gt;

&lt;h4&gt;Cross-Compiling Open Babel&lt;/h4&gt;

&lt;p&gt;For this tutorial, we'll use the &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=40728&amp;amp;package_id=32894&amp;amp;release_id=521581"&gt;Open Babel 2.1.1&lt;/a&gt; source distribution. Unpack the tarball and change into the directory.&lt;/p&gt;

&lt;p&gt;Next, we'll need to set up our cross-compiler environment. Fortunately, NestedVM has made this easy. If you check your environment variables, you'll find that &lt;tt&gt;CXX&lt;/tt&gt; and &lt;tt&gt;CC&lt;/tt&gt; have both been set. All that remains is to notify the configure script that we'll be cross-compiling:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ ./configure --host=mips-unknown-elf
&lt;/div&gt;

&lt;p&gt;&lt;/pre&gt;&lt;/p&gt;

&lt;p&gt;Then we build the MIPS binaries:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ make
&lt;/div&gt;

&lt;p&gt;&lt;/pre&gt;&lt;/p&gt;

&lt;p&gt;Peeking into the &lt;strong&gt;tools&lt;/strong&gt; directory, we can see all of the Open Babel command line tools have been built, including &lt;tt&gt;babel&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;Unless you're running a MIPS machine, though, this binary won't be executable.&lt;/p&gt;

&lt;p&gt;So far, it looks like everything worked. Although it didn't work the first time I tried it, the NestedVM team &lt;a href="http://groups.google.com/group/nestedvm/browse_thread/thread/7373accf6010d6d7"&gt;were most helpful&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Building the Java Class File&lt;/h4&gt;

&lt;p&gt;We're now ready for the final stage in the process, converting the MIPS binary to a Java class file. Again, NestedVM makes this simple:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ cd tools
$ java org.ibex.nestedvm.Compiler -outfile Babel.class Babel babel
Exception in thread "main" java.lang.IllegalStateException: unresolved phantom target
        at org.ibex.classgen.MethodGen.resolveTarget(MethodGen.java:555)
        at org.ibex.classgen.MethodGen._generateCode(MethodGen.java:664)
        at org.ibex.classgen.MethodGen.generateCode(MethodGen.java:618)
        at org.ibex.classgen.MethodGen.dump(MethodGen.java:888)
        at org.ibex.classgen.ClassFile._dump(ClassFile.java:193)
        at org.ibex.classgen.ClassFile.dump(ClassFile.java:160)
        at org.ibex.nestedvm.ClassFileCompiler.__go(ClassFileCompiler.java:380)
        at org.ibex.nestedvm.ClassFileCompiler._go(ClassFileCompiler.java:72)
        at org.ibex.nestedvm.Compiler.go(Compiler.java:259)

&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Unfortunately, NestedVM has blown up with an exception. Although our target class file, &lt;strong&gt;Babel.class&lt;/strong&gt; is now in our working directory, it is not complete and won't run.&lt;/p&gt;

&lt;h4&gt;What Went Wrong?&lt;/h4&gt;

&lt;p&gt;After bringing this problem to the &lt;a href="http://groups.google.com/group/nestedvm"&gt;NestedVM mailing list&lt;/a&gt;, it appears that this is a &lt;a href="http://groups.google.com/group/nestedvm/browse_thread/thread/b5d114a20a6b672b"&gt;NestedVM bug&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, the way &lt;tt&gt;babel&lt;/tt&gt; works is to load its various language modules dynamically. It may be possible to fix the problem by producing a version of &lt;tt&gt;babel&lt;/tt&gt; containing all of its modules in a single binary.&lt;/p&gt;

&lt;p&gt;Although there is a major issue to be resolved, this tutorial illustrates the full process of compiling C++ code to Java bytecode using NestedVM.&lt;/p&gt;</description>
      <pubDate>Mon, 19 Nov 2007 10:42:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:63647f0e-a238-45dd-af26-f57319f21001</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/19/compiling-open-babel-to-pure-java-bytecode-with-nestedvm-an-unsuccessful-first-attempt</link>
      <category>Tools</category>
      <category>nestedvm</category>
      <category>openbabel</category>
      <category>crosscompile</category>
      <category>mips</category>
      <category>bytecode</category>
    </item>
    <item>
      <title>Roll Your Own Chemical Database With Free Components</title>
      <description>&lt;p&gt;Are you thinking of building a &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;free chemical database&lt;/a&gt; but would rather not rent and maintain a bunch of proprietary software components? &lt;a href="http://merian.pch.univie.ac.at/pch/nh_info.html"&gt;Norbert Haider&lt;/a&gt; has thought a lot about this problem and offers some helpful resources to get you started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://merian.pch.univie.ac.at/%7Enhaider/cheminf/moldb.html"&gt;Creating a web-based, searchable molecular structure database using free software&lt;/a&gt; Step-by step case study&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://merian.pch.univie.ac.at/%7Enhaider/cheminf/moldb.pdf"&gt;How to create a web-based molecular structure database with free software&lt;/a&gt; A presentation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://merian.pch.univie.ac.at/%7Enhaider/cheminf/cmmm.html"&gt;checkmol/matchmol&lt;/a&gt; Open source command-line utility for 2D (sub)structure matching&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://merian.pch.univie.ac.at/%7Enhaider/cheminf/mol2ps.html"&gt;mol2ps&lt;/a&gt; Command-line utility for converting molfiles into Postscript files&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Haider's system can be deployed on commodity hardware running open source operating systems. In other words, the cost of setting up a system like the one he describes is practically zero.&lt;/p&gt;

&lt;p&gt;Creating and open sourcing your own custom components is one way to go. Building on top of existing open source tools like &lt;a href="http://cdk.sf.net"&gt;CDK&lt;/a&gt;, &lt;a href="http://openbabel.sf.net"&gt;Open Babel&lt;/a&gt;, &lt;a href="http://depth-first.com/articles/tag/octet"&gt;Octet&lt;/a&gt; and &lt;a href="http://joelib.sf.net"&gt;JOELib&lt;/a&gt; is another.&lt;/p&gt;

&lt;p&gt;Haider's work raises an interesting question. Has anyone assembled a complete, ready to install general purpose chemical database package built from open source components? It for no other reason, such an exercise would give an excellent idea of what &lt;a href="http://depth-first.com/articles/2007/01/03/open-source-and-open-data-why-we-should-eat-our-own-dogfood"&gt;the dogfood tastes like&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Fri, 13 Apr 2007 10:27:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:a017a4e0-d8a0-48c2-87a3-5554b99b7373</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/04/13/roll-your-own-chemical-database-with-free-components</link>
      <category>Tools</category>
      <category>database</category>
      <category>2d</category>
      <category>web</category>
      <category>cdk</category>
      <category>openbabel</category>
      <category>opensource</category>
      <category>joelib</category>
    </item>
  </channel>
</rss>
