<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag fingerprint</title>
    <link>http://depth-first.com/articles/tag/fingerprint</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Fast Substructure Search Using Open Source Tools Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/leechypics/505513640/"&gt;&lt;img src="http://depth-first.com/demo/20081029/fingerprint.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;We can think of a fingerprint as a bucket into which every molecule in the universe can be reproducibly placed. Each molecule will belong to a single bucket, but each bucket may contain any number of molecules. In other words, there exists a one-to-many relationship between a fingerprint and its associated molecules. The &lt;a href="http://depth-first.com/articles/2008/10/21/fast-substructure-search-using-open-source-tools-part-5-relating-molecules-to-fingerprints-with-sql"&gt;previous article in this series&lt;/a&gt; discussed how to model this relationship using SQL. This article will take the idea one step further by describing one way to model this relationship in Ruby.&lt;/p&gt;

&lt;p&gt;All Articles in this Series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases"&gt;Part 1: Fingerprints and Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql"&gt;Part 2: Fingerprint Screen With SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;Part 3: A CRUD API for Fingerprints in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/15/fast-substructure-search-using-open-source-tools-part-4-creating-fingerprints-from-chemical-structures"&gt;Part 4: Creating Fingerprints from Chemical Structures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/21/fast-substructure-search-using-open-source-tools-part-5-relating-molecules-to-fingerprints-with-sql"&gt;Part 5: Relating Molecules to Fingerprints with SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;SQL Recap&lt;/h4&gt;

&lt;p&gt;So far, we've set up a fingerprints database:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; describe fingerprints;
+--------+---------------------+------+-----+---------+----------------+
| Field  | Type                | Null | Key | Default | Extra          |
+--------+---------------------+------+-----+---------+----------------+
| id     | int(11)             | NO   | PRI | NULL    | auto_increment | 
| byte0  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte1  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte2  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte3  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte4  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte5  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte6  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte7  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte8  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte9  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte10 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte11 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte12 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte13 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte14 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte15 | bigint(64) unsigned | YES  |     | 0       |                | 
+--------+---------------------+------+-----+---------+----------------+
17 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This database contains a single (empty) fingerprint:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; select * from fingerprints;
+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+
| id | byte0 | byte1 | byte2 | byte3 | byte4 | byte5 | byte6 | byte7 | byte8 | byte9 | byte10 | byte11 | byte12 | byte13 | byte14 | byte15 |
+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+
|  1 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |      0 |      0 |      0 |      0 |      0 |      0 | 
+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+
1 row in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We've also set up a compounds database containing a foreign key (&lt;tt&gt;fingerprint_id&lt;/tt&gt;) into the fingerprints table:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; describe compounds;
+----------------+---------+------+-----+---------+----------------+
| Field          | Type    | Null | Key | Default | Extra          |
+----------------+---------+------+-----+---------+----------------+
| id             | int(11) | NO   | PRI | NULL    | auto_increment | 
| fingerprint_id | int(11) | YES  |     | NULL    |                | 
| smiles         | text    | YES  |     | NULL    |                | 
+----------------+---------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;In this hypothetical example, the compounds database is populated by two molecules, benzene and bromobenzene, both of which share the same fingerprint:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; select * from compounds;
+----+----------------+------------+
| id | fingerprint_id | smiles     |
+----+----------------+------------+
|  1 |              1 | c1ccccc1   | 
|  2 |              1 | c1ccccc1Br | 
+----+----------------+------------+
2 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Adding the Ruby Layer&lt;/h4&gt;

&lt;p&gt;In &lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;Part 3&lt;/a&gt;, we created a CRUD API for fingerprints in Ruby. We now need to modify the class we created there, Fingerprint, to make it aware of the Compounds it will be associated with.&lt;/p&gt;

&lt;p&gt;For brevity, you can &lt;a href="http://depth-first.com/demo/20081029/fingerprint.rb"&gt;view the updated Fingerprint class here&lt;/a&gt;. The main change has been to add a single line of code that tells &lt;tt&gt;Fingerprint&lt;/tt&gt; that it's now associated with a class called &lt;tt&gt;Compound&lt;/tt&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;  &lt;span class="ident"&gt;has_many&lt;/span&gt; &lt;span class="symbol"&gt;:compounds&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

All that remains is to bring the &lt;tt&gt;Compound&lt;/tt&gt; class into being:

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;active_record&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;ActiveRecord&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Base&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;establish_connection&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;
  &lt;span class="symbol"&gt;:adapter&lt;/span&gt;    &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mysql&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
  &lt;span class="symbol"&gt;:host&lt;/span&gt;       &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;localhost&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
  &lt;span class="symbol"&gt;:username&lt;/span&gt;   &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt;  &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;root&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
  &lt;span class="symbol"&gt;:password&lt;/span&gt;   &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt;  &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
  &lt;span class="symbol"&gt;:database&lt;/span&gt;   &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt;  &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;compounds&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="punct"&gt;)&lt;/span&gt;

&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Compound&lt;/span&gt; &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt; &lt;span class="constant"&gt;ActiveRecord&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Base&lt;/span&gt;
  &lt;span class="ident"&gt;belongs_to&lt;/span&gt; &lt;span class="symbol"&gt;:fingerprint&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

The &lt;tt&gt;belongs_to&lt;/tt&gt; line is the counterpart to &lt;tt&gt;Fingerprint's&lt;/tt&gt; &lt;tt&gt;has_many&lt;/tt&gt; line. Together, both &lt;tt&gt;Fingerprint&lt;/tt&gt; and &lt;tt&gt;Compound&lt;/tt&gt; create a system in which each &lt;tt&gt;Fingerprint&lt;/tt&gt; can reference multiple &lt;tt&gt;Compounds&lt;/tt&gt; and each &lt;tt&gt;Compound&lt;/tt&gt; references one &lt;tt&gt;Fingerprint&lt;/tt&gt;.

Let's test this with interactive Ruby:

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&amp;gt; require 'fingerprint'
=&amp;gt; true
irb(main):002:0&amp;gt; f=Fingerprint.find 1
=&amp;gt; #&amp;lt;Fingerprint id: 1, byte0: 0, byte1: 0, byte2: 0, byte3: 0, byte4: 0, byte5: 0, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 0, byte11: 0, byte12: 0, byte13: 0, byte14: 0, byte15: 0&amp;gt;
irb(main):003:0&amp;gt; f.compounds
=&amp;gt; [#&amp;lt;Compound id: 1, fingerprint_id: 1, smiles: "c1ccccc1"&amp;gt;, #&amp;lt;Compound id: 2, fingerprint_id: 1, smiles: "c1ccccc1Br"&amp;gt;]
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Looks good. Our code has made the correct association between a &lt;tt&gt;Fingerprint&lt;/tt&gt; and its &lt;tt&gt;Compounds&lt;/tt&gt;. What about the other way around?&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&amp;gt; require 'compound'
=&amp;gt; true
irb(main):002:0&amp;gt; c=Compound.find 1
=&amp;gt; #&amp;lt;Compound id: 1, fingerprint_id: 1, smiles: "c1ccccc1"&amp;gt;
irb(main):003:0&amp;gt; c.fingerprint
=&amp;gt; #&amp;lt;Fingerprint id: 1, byte0: 0, byte1: 0, byte2: 0, byte3: 0, byte4: 0, byte5: 0, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 0, byte11: 0, byte12: 0, byte13: 0, byte14: 0, byte15: 0&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As expected, the first &lt;tt&gt;Compound&lt;/tt&gt; became associated with the correct &lt;tt&gt;Fingerprint&lt;/tt&gt;.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Our system can now store and query molecular fingerprints in a relational database. It also associates multiple compounds with each fingerprint.&lt;/p&gt;

&lt;p&gt;We have a complete fingerprint screening system, but not a substructure search system.&lt;/p&gt;

&lt;p&gt;What's missing? For one thing, we'd need a way to perform atom-by-atom searches (ABAS) of all candidate structures after the fingerprint screening process is complete. Recall that just because a query fingerprint matches a candidate fingerprint doesn't necessarily mean that a substructure match has been found.&lt;/p&gt;

&lt;p&gt;We'd also need a way to conveniently get real compounds with real fingerprints into our database. Only then would we be able to test the chemical validity of substructure queries.&lt;/p&gt;

&lt;p&gt;The remaining articles in this series will discuss approaches to each of these requirements.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/leechypics/"&gt;leeechy&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 29 Oct 2008 17:15:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:442a864c-12bc-4ba5-a6f6-f9ca95180215</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/29/fast-substructure-search-using-open-source-tools-part-6-modelling-a-one-to-many-relationship-between-fingerprints-and-compounds-in-ruby</link>
      <category>Tools</category>
      <category>ruby</category>
      <category>substructuresearch</category>
      <category>fingerprint</category>
      <category>chemicaldatabase</category>
      <category>sql</category>
      <category>onetomany</category>
      <category>mysql</category>
    </item>
    <item>
      <title>Fast Substructure Search Using Open Source Tools Part 5: Relating Molecules to Fingerprints with SQL</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/robfon/2174992215/"&gt;&lt;img src="http://depth-first.com/demo/20081020/drive.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;A molecular fingerprint is a special kind of &lt;a href="http://en.wikipedia.org/wiki/Hash_function"&gt;hash function&lt;/a&gt; that can reproducibly place any molecule, known or unknown, into one of a large but finite set of groups. Each molecule will be associated with exactly one fingerprint, but each fingerprint can be associated with multiple molecules. In other words, there exists a one-to-many relationship between fingerprints and molecules. This article outlines one of the final steps in creating a substructure-searchable relational chemical database by describing a simple method for associating fingerprints and molecules.&lt;/p&gt;

&lt;p&gt;All Articles in this Series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases"&gt;Part 1: Fingerprints and Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql"&gt;Part 2: Fingerprint Screen With SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;Part 3: A CRUD API for Fingerprints in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/15/fast-substructure-search-using-open-source-tools-part-4-creating-fingerprints-from-chemical-structures"&gt;Part 4: Creating Fingerprints from Chemical Structures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 5: Relating Molecules to Fingerprints with SQL&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/29/fast-substructure-search-using-open-source-tools-part-6-modelling-a-one-to-many-relationship-between-fingerprints-and-compounds-in-ruby"&gt;Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Modelling a One-To-Many Relationship&lt;/h4&gt;

&lt;p&gt;The &lt;a href="http://www.onlamp.com/pub/a/onlamp/2001/03/20/aboutSQL.html"&gt;one-to-many relationship&lt;/a&gt; is one of the most fundamental concepts in relational databases. In our case, we'd like to create a new table called &lt;tt&gt;compounds&lt;/tt&gt;. We'd furthermore like to link each row in the &lt;tt&gt;compounds&lt;/tt&gt; table with a row in the &lt;tt&gt;fingerprints&lt;/tt&gt; table. This can be accomplished by adding a column to the &lt;tt&gt;compounds&lt;/tt&gt; table that's capable of holding an id from the "fingerprints" table (&lt;a href="http://en.wikipedia.org/wiki/Foreign_key"&gt;foreign key&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;This would then give us the ability to gather all of the rows in the &lt;tt&gt;compounds&lt;/tt&gt; table that match a particular fingerprint (or group of fingerprints).&lt;/p&gt;

&lt;h4&gt;Creating the &lt;tt&gt;compounds&lt;/tt&gt; Table&lt;/h4&gt;

&lt;p&gt;The &lt;tt&gt;compounds&lt;/tt&gt; table we'll create will store three pieces of information:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A unique id (something that all of our tables will have).&lt;/li&gt;
&lt;li&gt;An integer column called "fingerprint_id" that will store the unique id of a fingerprint described by a row in the &lt;tt&gt;fingerprints&lt;/tt&gt; table.&lt;/li&gt;
&lt;li&gt;A string column called "smiles" that will hold the SMILES string of each compound in compact form.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We can create the table with the following:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; create table compounds (id int not null auto_increment, primary key(id), fingerprint_id int, smiles text);
Query OK, 0 rows affected (0.01 sec)

mysql&gt; describe compounds;
+----------------+---------+------+-----+---------+----------------+
| Field          | Type    | Null | Key | Default | Extra          |
+----------------+---------+------+-----+---------+----------------+
| id             | int(11) | NO   | PRI | NULL    | auto_increment | 
| fingerprint_id | int(11) | YES  |     | NULL    |                | 
| smiles         | text    | YES  |     | NULL    |                | 
+----------------+---------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Using &lt;tt&gt;compounds&lt;/tt&gt; and &lt;tt&gt;fingerprints&lt;/tt&gt; Together&lt;/h4&gt;

&lt;p&gt;Now let's populate our database with some simple, fake data. If you haven't done so already, delete all rows from your existing &lt;tt&gt;fingerprints&lt;/tt&gt; table:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; delete from fingerprints;
Query OK, 0 rows affected (0.00 sec)

mysql&gt; select * from fingerprints;
Empty set (0.00 sec)

mysql&gt; describe fingerprints;
+--------+---------------------+------+-----+---------+----------------+
| Field  | Type                | Null | Key | Default | Extra          |
+--------+---------------------+------+-----+---------+----------------+
| id     | int(11)             | NO   | PRI | NULL    | auto_increment | 
| byte0  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte1  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte2  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte3  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte4  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte5  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte6  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte7  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte8  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte9  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte10 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte11 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte12 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte13 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte14 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte15 | bigint(64) unsigned | YES  |     | 0       |                | 
+--------+---------------------+------+-----+---------+----------------+
17 rows in set (0.01 sec)

&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Now let's create a dummy fingerprint for the sake of simplicity:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; insert into fingerprints () values();
Query OK, 1 row affected (0.00 sec)

mysql&gt; select * from fingerprints;
+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+
| id    | byte0 | byte1 | byte2 | byte3 | byte4 | byte5 | byte6 | byte7 | byte8 | byte9 | byte10 | byte11 | byte12 | byte13 | byte14 | byte15 |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+
| 16806 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |     0 |      0 |      0 |      0 |      0 |      0 |      0 | 
+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+--------+--------+--------+--------+
1 row in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Let's associate two compounds with this fingerprint:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; insert into compounds (fingerprint_id,smiles) values(16806,'c1ccccc1');
Query OK, 1 row affected (0.01 sec)

mysql&gt; insert into compounds (fingerprint_id,smiles) values(16806,'c1ccccc1Br');
Query OK, 1 row affected (0.00 sec)

mysql&gt; select * from compounds;
+----+----------------+------------+
| id | fingerprint_id | smiles     |
+----+----------------+------------+
| 20 |          16806 | c1ccccc1   | 
| 21 |          16806 | c1ccccc1Br | 
+----+----------------+------------+
2 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We can now find all compounds with fingerprints containing no bits set:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; select compounds.* from compounds inner join fingerprints on compounds.fingerprint_id=fingerprints.id where fingerprints.byte0=0 and fingerprints.byte1=0 and fingerprints.byte2=0 and fingerprints.byte3=0 and fingerprints.byte4=0 and fingerprints.byte5=0 and fingerprints.byte6=0 and fingerprints.byte7=0 and fingerprints.byte8=0 and fingerprints.byte8=0 and fingerprints.byte9=0 and fingerprints.byte10=0 and fingerprints.byte11=0 and fingerprints.byte12=0 and fingerprints.byte13=0 and fingerprints.byte14=0 and fingerprints.byte15=0;
+----+----------------+------------+
| id | fingerprint_id | smiles     |
+----+----------------+------------+
| 20 |          16806 | c1ccccc1   | 
| 21 |          16806 | c1ccccc1Br | 
+----+----------------+------------+
2 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We could just as easily replace the "=" operator with the "&amp;amp;" operator to perform substructure fingerprint screens. Although the data we're using is hardly realistic, the same concepts apply regardless of how the fingerprints are constructed.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;We now have a way to associate fingerprints with compounds stored in our database. Although we could continue to populate and query our database using hand-coded SQL statements, what we'd really like to use is an API written in a high-level programming language. The next article in this series will demonstrate how this can be done in Ruby.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/robfon/"&gt;Roberto F.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Tue, 21 Oct 2008 01:28:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:89f24801-86a5-4bca-91af-8e610a6fe539</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/21/fast-substructure-search-using-open-source-tools-part-5-relating-molecules-to-fingerprints-with-sql</link>
      <category>Tools</category>
      <category>mysql</category>
      <category>sql</category>
      <category>onetomany</category>
      <category>database</category>
      <category>fingerprint</category>
      <category>compound</category>
    </item>
    <item>
      <title>Fast Substructure Search Using Open Source Tools Part 4: Creating Fingerprints from Chemical Structures</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/adrenalin/4250667/"&gt;&lt;img src="http://depth-first.com/demo/20081015/falls.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The previous articles in this series have detailed the steps needed to build a working fingerprint screening system using nothing more than the open source tools &lt;a href="http://www.mysql.com/"&gt;MySQL&lt;/a&gt;, &lt;a href="http://ruby-lang.org"&gt;Ruby&lt;/a&gt;, and &lt;a href="http://ar.rubyonrails.org/"&gt;ActiveRecord&lt;/a&gt;. With this system we can create, read, update, and destroy fingerprints in persistent storage. Although the system meets all of the requirements of a fingerprint screening system, it isn't a substructure search system - yet. For that, we need a way to convert chemical structure representations into fingerprints. This article describes a very simple method for doing so.&lt;/p&gt;

&lt;p&gt;All Articles in this Series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases"&gt;Part 1: Fingerprints and Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql"&gt;Part 2: Fingerprint Screen With SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;Part 3: A CRUD API for Fingerprints in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 4: Creating Fingerprints from Chemical Structures&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/21/fast-substructure-search-using-open-source-tools-part-5-relating-molecules-to-fingerprints-with-sql"&gt;Part 5: Relating Molecules to Fingerprints with SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/29/fast-substructure-search-using-open-source-tools-part-6-modelling-a-one-to-many-relationship-between-fingerprints-and-compounds-in-ruby"&gt;Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;A Ruby Fingerprinter in Eight Lines&lt;/h4&gt;

&lt;p&gt;Let's create a &lt;tt&gt;Fingerprinter&lt;/tt&gt; class that's capable of converting a SMILES string into a &lt;tt&gt;Fingerprint&lt;/tt&gt; that can be stored and queried. The Ruby code below makes use of Open Babel's &lt;a href="http://openbabel.org/wiki/Babel"&gt;&lt;tt&gt;babel&lt;/tt&gt;&lt;/a&gt; command-line utility:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Fingerprinter&lt;/span&gt;  
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;fingerprint_smiles&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="ident"&gt;raw&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;%x[&lt;/span&gt;&lt;span class="string"&gt;echo '&lt;span class="expr"&gt;#{smiles}&lt;/span&gt;' | babel -ismi -ofpt 2&amp;gt;/dev/null&lt;/span&gt;&lt;span class="punct"&gt;]&lt;/span&gt;
    &lt;span class="ident"&gt;bytes&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;raw&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;gsub&lt;/span&gt;&lt;span class="punct"&gt;(/&lt;/span&gt;&lt;span class="regex"&gt;&amp;gt;.*?&lt;span class="escape"&gt;\n&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;/,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;').&lt;/span&gt;&lt;span class="ident"&gt;gsub&lt;/span&gt;&lt;span class="punct"&gt;(/&lt;/span&gt;&lt;span class="regex"&gt;&lt;span class="escape"&gt;\n&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;/,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;').&lt;/span&gt;&lt;span class="ident"&gt;split&lt;/span&gt;

    &lt;span class="constant"&gt;Fingerprint&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;fill_bytes&lt;/span&gt;&lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;i&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{bytes[2*i]}#{bytes[2*i+1]}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;.&lt;/span&gt;&lt;span class="ident"&gt;hex&lt;/span&gt;&lt;span class="punct"&gt;}&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This class takes advantage of Ruby's ability to interface directly with the command line through the &lt;tt&gt;%x&lt;/tt&gt; operator in a way similar to that previously described for the &lt;a href="http://depth-first.com/articles/2008/05/30/a-simple-and-portable-ruby-interface-to-inchi-part-2-silencing-console-output"&gt;cInChI command line tool&lt;/a&gt;. The &lt;tt&gt;babel&lt;/tt&gt; output is then converted into a form suitable for use with our &lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;previously-defined&lt;/a&gt; &lt;tt&gt;Fingerprint&lt;/tt&gt; class.&lt;/p&gt;

&lt;p&gt;Although quite easy to implement, this approach may not work in every situation. For example, the &lt;tt&gt;fingerprint_smiles&lt;/tt&gt; method opens the possibility that a malicious user could attempt to execute arbitrary shell commands by creating a mis-formed SMILES string. Windows users may need to adapt the code. But for trusted SMILES on Unix machines, this implementation works well and can be used in many different programming environments.&lt;/p&gt;

&lt;h4&gt;Testing the Fingerprinter&lt;/h4&gt;

We can test the Fingerprinter through interactive Ruby (irb):

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&amp;gt; require 'lib/fingerprinter'
=&amp;gt; true
irb(main):002:0&amp;gt; fp=Fingerprinter.new
=&amp;gt; #&amp;lt;Fingerprinter:0xb7498038&amp;gt;
irb(main):003:0&amp;gt; f=fp.fingerprint_smiles 'c1ccccc1'
=&amp;gt; #&amp;lt;Fingerprint id: nil, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: nil&amp;gt;
irb(main):004:0&amp;gt; f.cardinality
=&amp;gt; 6
irb(main):005:0&amp;gt; f.bitstring
=&amp;gt; "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000100000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As we previously saw, any &lt;tt&gt;Fingerprint&lt;/tt&gt; we create can be stored and later retrieved from a MySQL database. If we've already stored the fingerprint for benzene it can be found with the following:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&amp;gt; require 'lib/fingerprinter'
=&amp;gt; true
irb(main):002:0&amp;gt; fp=Fingerprinter.new
=&amp;gt; #&amp;lt;Fingerprinter:0xb74ae284&amp;gt;
irb(main):003:0&amp;gt; f=fp.fingerprint_smiles 'c1ccccc1'
=&amp;gt; #&amp;lt;Fingerprint id: nil, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: nil&amp;gt;
irb(main):004:0&amp;gt; Fingerprint.find_by_fingerprint f
=&amp;gt; #&amp;lt;Fingerprint id: 12687, byte0: 0, byte1: 512, byte2: 0, byte3: 0, byte4: 2112, byte5: 32768, byte6: 0, byte7: 0, byte8: 0, byte9: 0, byte10: 134217728, byte11: 0, byte12: 0, byte13: 0, byte14: 131072, byte15: 0, hex: "000000000000000000000000000002000000000000000000000..."&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;We now have the ability to create, store, and query fingerprints created from arbitrary SMILES strings. If there were a 1:1 relationship between molecules and fingerprints, we'd be nearly done. But things are not quite that simple. The next article in this series will show how to relate molecules to fingerprints.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/adrenalin/"&gt;adrenalin&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 15 Oct 2008 14:42:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:ad16d97d-9183-4e25-8b88-26a28ffdca48</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/15/fast-substructure-search-using-open-source-tools-part-4-creating-fingerprints-from-chemical-structures</link>
      <category>Tools</category>
      <category>ruby</category>
      <category>activerecord</category>
      <category>openbabel</category>
      <category>commandline</category>
      <category>fingerprint</category>
      <category>database</category>
      <category>substructuresearch</category>
      <category>query</category>
    </item>
    <item>
      <title>Fast Substructure Search Using Open Source Tools Part 3: A CRUD API for Fingerprints in Ruby</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/58595467@N00/2282011834/"&gt;&lt;img src="http://depth-first.com/demo/20081006/hd.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The previous article in this series showed how to perform fingerprint screens for substructure searches &lt;a href="http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql"&gt;using nothing more than SQL&lt;/a&gt;. Although this is significant progress, working at the level of SQL queries to perform create, read, update, and delete operations (&lt;a href="http://en.wikipedia.org/wiki/Create,_read,_update_and_delete"&gt;CRUD&lt;/a&gt;) on our fingerprint table is more work than it needs to be. We'd really prefer to use an API written in a high-level programming language. This article describes a simple Ruby API for managing and querying a database of molecular fingerprints.&lt;/p&gt;

&lt;p&gt;All Articles in this Series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases"&gt;Part 1: Fingerprints and Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql"&gt;Part 2: Fingerprint Screen With SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: A CRUD API for Fingerprints in Ruby&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/15/fast-substructure-search-using-open-source-tools-part-4-creating-fingerprints-from-chemical-structures"&gt;Part 4: Creating Fingerprints from Chemical Structures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/21/fast-substructure-search-using-open-source-tools-part-5-relating-molecules-to-fingerprints-with-sql"&gt;Part 5: Relating Molecules to Fingerprints with SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/29/fast-substructure-search-using-open-source-tools-part-6-modelling-a-one-to-many-relationship-between-fingerprints-and-compounds-in-ruby"&gt;Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Some Changes to the Database Schema&lt;/h4&gt;

&lt;p&gt;Before we move forward, we must deal with one minor detail. By default &lt;a href="http://dev.mysql.com/doc/refman/4.1/en/numeric-types.html"&gt;MySQL uses signed 64-bit integers&lt;/a&gt;. This gives a range for integers of -9223372036854775808 to 9223372036854775807. Ruby, on the other hand, can work with integers of any size through the &lt;tt&gt;Bignum&lt;/tt&gt; class - and we'll be taking full advantage of this feature.&lt;/p&gt;

&lt;p&gt;If we want to avoid the headache of constantly accounting for the difference, we need to tell our database to use unsigned integers in the fingerprints table. This can be done by first dropping the old table:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; drop table fingerprints;
Query OK, 0 rows affected (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Now let's create a new table in which the &lt;tt&gt;fp&lt;sub&gt;n&lt;/sub&gt;&lt;/tt&gt; columns store unsigned integers only. While we're at it, let's change the naming of these columns from &lt;tt&gt;fp&lt;sub&gt;n&lt;/sub&gt;&lt;/tt&gt; to the more descriptive &lt;tt&gt;byte&lt;sub&gt;n&lt;/sub&gt;&lt;/tt&gt; and set a default value of zero.&lt;/p&gt;

&lt;p&gt;The new table can be created with with:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&amp;gt; create table fingerprints(id int not null auto_increment, primary key(id), byte0 bigint(64) unsigned default 0, byte1 bigint(64) unsigned default 0, byte2 bigint(64) unsigned default 0, byte3 bigint(64) unsigned default 0, byte4 bigint(64) unsigned default 0, byte5 bigint(64) unsigned default 0, byte6 bigint(64) unsigned default 0, byte7 bigint(64) unsigned default 0, byte8 bigint(64) unsigned default 0, byte9 bigint(64) unsigned default 0, byte10 bigint(64) unsigned default 0, byte11 bigint(64) unsigned default 0, byte12 bigint(64) unsigned default 0, byte13 bigint(64) unsigned default 0, byte14 bigint(64) unsigned default 0, byte15 bigint(64) unsigned default 0);
Query OK, 0 rows affected (0.00 sec)

mysql&gt; describe fingerprints;
+--------+---------------------+------+-----+---------+----------------+
| Field  | Type                | Null | Key | Default | Extra          |
+--------+---------------------+------+-----+---------+----------------+
| id     | int(11)             | NO   | PRI | NULL    | auto_increment | 
| byte0  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte1  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte2  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte3  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte4  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte5  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte6  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte7  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte8  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte9  | bigint(64) unsigned | YES  |     | 0       |                | 
| byte10 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte11 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte12 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte13 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte14 | bigint(64) unsigned | YES  |     | 0       |                | 
| byte15 | bigint(64) unsigned | YES  |     | 0       |                | 
+--------+---------------------+------+-----+---------+----------------+
17 rows in set (0.01 sec)

&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We're now ready to create the Ruby API.&lt;/p&gt;

&lt;h4&gt;The API&lt;/h4&gt;

&lt;p&gt;The code below is all we need to begin querying and managing our fingerprint database in Ruby:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;active_record&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;ActiveRecord&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Base&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;establish_connection&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;
  &lt;span class="symbol"&gt;:adapter&lt;/span&gt;    &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mysql&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
  &lt;span class="symbol"&gt;:host&lt;/span&gt;       &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;localhost&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
  &lt;span class="symbol"&gt;:username&lt;/span&gt;   &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt;  &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;root&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
  &lt;span class="symbol"&gt;:password&lt;/span&gt;   &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt;  &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt;
  &lt;span class="symbol"&gt;:database&lt;/span&gt;   &lt;span class="punct"&gt;=&amp;gt;&lt;/span&gt;  &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;compounds&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="punct"&gt;)&lt;/span&gt;

&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Fingerprint&lt;/span&gt; &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt; &lt;span class="constant"&gt;ActiveRecord&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Base&lt;/span&gt;
  &lt;span class="attribute"&gt;@@bytes_prefix&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;byte&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;each_byte&lt;/span&gt;
    &lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;upto&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;byte_count&lt;/span&gt; &lt;span class="punct"&gt;-&lt;/span&gt; &lt;span class="number"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;i&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="keyword"&gt;yield&lt;/span&gt; &lt;span class="ident"&gt;send&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{@@bytes_prefix}#{i}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt;  &lt;span class="punct"&gt;}&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;each_byte_with_index&lt;/span&gt;
    &lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;upto&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;byte_count&lt;/span&gt; &lt;span class="punct"&gt;-&lt;/span&gt; &lt;span class="number"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;i&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="keyword"&gt;yield&lt;/span&gt; &lt;span class="ident"&gt;send&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{@@bytes_prefix}#{i}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;),&lt;/span&gt; &lt;span class="ident"&gt;i&lt;/span&gt;  &lt;span class="punct"&gt;}&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;fill_bytes&lt;/span&gt;
    &lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;upto&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;byte_count&lt;/span&gt; &lt;span class="punct"&gt;-&lt;/span&gt; &lt;span class="number"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;i&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="ident"&gt;send&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{@@bytes_prefix}#{i}&lt;/span&gt;=&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="keyword"&gt;yield&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;i&lt;/span&gt;&lt;span class="punct"&gt;))}&lt;/span&gt;

    &lt;span class="constant"&gt;self&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;to_byte_array&lt;/span&gt;
    &lt;span class="constant"&gt;Array&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;16&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;fill&lt;/span&gt;&lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;i&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="ident"&gt;send&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{@@bytes_prefix}#{i}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)}&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;byte_count&lt;/span&gt;
    &lt;span class="ident"&gt;result&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;0&lt;/span&gt;

    &lt;span class="keyword"&gt;while&lt;/span&gt; &lt;span class="ident"&gt;respond_to?&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{@@bytes_prefix}#{result}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
      &lt;span class="ident"&gt;result&lt;/span&gt; &lt;span class="punct"&gt;+=&lt;/span&gt; &lt;span class="number"&gt;1&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;result&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;bitstring&lt;/span&gt;
    &lt;span class="ident"&gt;result&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

    &lt;span class="ident"&gt;each_byte&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;byte&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="ident"&gt;result&lt;/span&gt; &lt;span class="punct"&gt;+=&lt;/span&gt;  &lt;span class="ident"&gt;sprintf&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;%064b&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="ident"&gt;byte&lt;/span&gt;&lt;span class="punct"&gt;)}&lt;/span&gt;

    &lt;span class="ident"&gt;result&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;cardinality&lt;/span&gt;
    &lt;span class="ident"&gt;bitstring&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;count&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;eql?&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;other&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;to_byte_array&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;eql?&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;other&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;to_byte_array&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;save&lt;/span&gt;
    &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;false&lt;/span&gt; &lt;span class="keyword"&gt;unless&lt;/span&gt; &lt;span class="constant"&gt;Fingerprint&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;find_by_fingerprint&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;self&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;empty?&lt;/span&gt;

    &lt;span class="keyword"&gt;super&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.find_by_fingerprint&lt;/span&gt; &lt;span class="ident"&gt;fingerprint&lt;/span&gt;
    &lt;span class="constant"&gt;Fingerprint&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;find_by_sql&lt;/span&gt; &lt;span class="ident"&gt;sql_for_find_by_fingerprint&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.find_children_by_fingerprint&lt;/span&gt; &lt;span class="ident"&gt;fingerprint&lt;/span&gt;
    &lt;span class="constant"&gt;Fingerprint&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;find_by_sql&lt;/span&gt; &lt;span class="ident"&gt;sql_for_find_children_by_fingerprint&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.sql_for_find_by_fingerprint&lt;/span&gt; &lt;span class="ident"&gt;fingerprint&lt;/span&gt;
    &lt;span class="ident"&gt;result&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;select fingerprints.* from fingerprints where &lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="ident"&gt;last&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;byte_count&lt;/span&gt; &lt;span class="punct"&gt;-&lt;/span&gt; &lt;span class="number"&gt;1&lt;/span&gt;

    &lt;span class="ident"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;each_byte_with_index&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;byte&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;i&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;result&lt;/span&gt; &lt;span class="punct"&gt;+=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{@@bytes_prefix}#{i}&lt;/span&gt;=&lt;span class="expr"&gt;#{byte}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="punct"&gt;((&lt;/span&gt;&lt;span class="ident"&gt;i&lt;/span&gt; &lt;span class="punct"&gt;==&lt;/span&gt;&lt;span class="ident"&gt;last&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="punct"&gt;?&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;:&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt; and &lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;result&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;self.sql_for_find_children_by_fingerprint&lt;/span&gt; &lt;span class="ident"&gt;fingerprint&lt;/span&gt;
    &lt;span class="ident"&gt;result&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;select fingerprints.* from fingerprints where &lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
    &lt;span class="ident"&gt;last&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;byte_count&lt;/span&gt; &lt;span class="punct"&gt;-&lt;/span&gt; &lt;span class="number"&gt;1&lt;/span&gt;

    &lt;span class="ident"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;each_byte_with_index&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;byte&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;i&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;result&lt;/span&gt; &lt;span class="punct"&gt;+=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{@@bytes_prefix}#{i}&lt;/span&gt;&amp;amp;&lt;span class="expr"&gt;#{byte}&lt;/span&gt;=&lt;span class="expr"&gt;#{byte}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="punct"&gt;((&lt;/span&gt;&lt;span class="ident"&gt;i&lt;/span&gt; &lt;span class="punct"&gt;==&lt;/span&gt;&lt;span class="ident"&gt;last&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="punct"&gt;?&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;:&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt; and &lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;

    &lt;span class="ident"&gt;result&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Testing the API&lt;/h4&gt;

&lt;p&gt;We can test this library from interactive ruby (irb). Let's add two fingerprints - the first consisting of all bits set to "1" and the second consisting of alternating "1" and "0" bits:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&amp;gt; require 'fingerprint'
=&amp;gt; true
irb(main):002:0&amp;gt; f1=Fingerprint.new.fill_bytes{"ffffffffffffffff".hex}
=&amp;gt; #&amp;lt;Fingerprint id: nil, byte0: 18446744073709551615, byte1: 18446744073709551615, byte2: 18446744073709551615, byte3: 18446744073709551615, byte4: 18446744073709551615, byte5: 18446744073709551615, byte6: 18446744073709551615, byte7: 18446744073709551615, byte8: 18446744073709551615, byte9: 18446744073709551615, byte10: 18446744073709551615, byte11: 18446744073709551615, byte12: 18446744073709551615, byte13: 18446744073709551615, byte14: 18446744073709551615, byte15: 18446744073709551615&amp;gt;
irb(main):003:0&amp;gt; f1.save
=&amp;gt; true
irb(main):004:0&amp;gt; f2=Fingerprint.new.fill_bytes{"aaaaaaaaaaaaaaaa".hex}
=&amp;gt; #&amp;lt;Fingerprint id: nil, byte0: 12297829382473034410, byte1: 12297829382473034410, byte2: 12297829382473034410, byte3: 12297829382473034410, byte4: 12297829382473034410, byte5: 12297829382473034410, byte6: 12297829382473034410, byte7: 12297829382473034410, byte8: 12297829382473034410, byte9: 12297829382473034410, byte10: 12297829382473034410, byte11: 12297829382473034410, byte12: 12297829382473034410, byte13: 12297829382473034410, byte14: 12297829382473034410, byte15: 12297829382473034410&amp;gt;
irb(main):005:0&amp;gt; f2.save
=&amp;gt; true
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Let's find the fingerprint in which all bits are turned on:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&amp;gt; require 'fingerprint'
=&amp;gt; true
irb(main):002:0&amp;gt; query=Fingerprint.new.fill_bytes{"ffffffffffffffff".hex}
=&amp;gt; #&amp;lt;Fingerprint id: nil, byte0: 18446744073709551615, byte1: 18446744073709551615, byte2: 18446744073709551615, byte3: 18446744073709551615, byte4: 18446744073709551615, byte5: 18446744073709551615, byte6: 18446744073709551615, byte7: 18446744073709551615, byte8: 18446744073709551615, byte9: 18446744073709551615, byte10: 18446744073709551615, byte11: 18446744073709551615, byte12: 18446744073709551615, byte13: 18446744073709551615, byte14: 18446744073709551615, byte15: 18446744073709551615&amp;gt;
irb(main):003:0&amp;gt; Fingerprint.find_by_fingerprint query
=&amp;gt; [#&amp;lt;Fingerprint id: 111, byte0: 18446744073709551615, byte1: 18446744073709551615, byte2: 18446744073709551615, byte3: 18446744073709551615, byte4: 18446744073709551615, byte5: 18446744073709551615, byte6: 18446744073709551615, byte7: 18446744073709551615, byte8: 18446744073709551615, byte9: 18446744073709551615, byte10: 18446744073709551615, byte11: 18446744073709551615, byte12: 18446744073709551615, byte13: 18446744073709551615, byte14: 18446744073709551615, byte15: 18446744073709551615&amp;gt;]
&lt;/div&gt;

&lt;p&gt;&lt;/pre&gt;&lt;/p&gt;

&lt;p&gt;Our query has found an exact match for the query fingerprint in the database at row 111. (This id is not 1 because previous automated tests that I wrote and executed have added and removed rows, advancing the id counter).&lt;/p&gt;

&lt;p&gt;We can also search the database for the children of an arbitrary fingerprint query. A test fingerprint A is a "child" of query Q if all of the set bits in Q are also set in A. Notice that this leaves open the possibility that A has &lt;em&gt;more&lt;/em&gt; bits set than Q. For example:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&amp;gt; require 'fingerprint'
=&amp;gt; true
irb(main):002:0&amp;gt; query=Fingerprint.new.fill_bytes{"aaaaaaaaaaaaaaaa".hex}
=&amp;gt; #&amp;lt;Fingerprint id: nil, byte0: 12297829382473034410, byte1: 12297829382473034410, byte2: 12297829382473034410, byte3: 12297829382473034410, byte4: 12297829382473034410, byte5: 12297829382473034410, byte6: 12297829382473034410, byte7: 12297829382473034410, byte8: 12297829382473034410, byte9: 12297829382473034410, byte10: 12297829382473034410, byte11: 12297829382473034410, byte12: 12297829382473034410, byte13: 12297829382473034410, byte14: 12297829382473034410, byte15: 12297829382473034410&amp;gt;
irb(main):003:0&amp;gt; results = Fingerprint.find_children_by_fingerprint query
=&amp;gt; [#&amp;lt;Fingerprint id: 112, byte0: 12297829382473034410, byte1: 12297829382473034410, byte2: 12297829382473034410, byte3: 12297829382473034410, byte4: 12297829382473034410, byte5: 12297829382473034410, byte6: 12297829382473034410, byte7: 12297829382473034410, byte8: 12297829382473034410, byte9: 12297829382473034410, byte10: 12297829382473034410, byte11: 12297829382473034410, byte12: 12297829382473034410, byte13: 12297829382473034410, byte14: 12297829382473034410, byte15: 12297829382473034410&amp;gt;, #&amp;lt;Fingerprint id: 111, byte0: 18446744073709551615, byte1: 18446744073709551615, byte2: 18446744073709551615, byte3: 18446744073709551615, byte4: 18446744073709551615, byte5: 18446744073709551615, byte6: 18446744073709551615, byte7: 18446744073709551615, byte8: 18446744073709551615, byte9: 18446744073709551615, byte10: 18446744073709551615, byte11: 18446744073709551615, byte12: 18446744073709551615, byte13: 18446744073709551615, byte14: 18446744073709551615, byte15: 18446744073709551615&amp;gt;]
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;It worked - both fingerprints stored in the database were found.&lt;/p&gt;

&lt;p&gt;We can delete a &lt;tt&gt;Fingerprint&lt;/tt&gt; like this:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&amp;gt; require 'fingerprint'
=&amp;gt; true
irb(main):002:0&amp;gt; f=Fingerprint.find 112
=&amp;gt; #&amp;lt;Fingerprint id: 112, byte0: 12297829382473034410, byte1: 12297829382473034410, byte2: 12297829382473034410, byte3: 12297829382473034410, byte4: 12297829382473034410, byte5: 12297829382473034410, byte6: 12297829382473034410, byte7: 12297829382473034410, byte8: 12297829382473034410, byte9: 12297829382473034410, byte10: 12297829382473034410, byte11: 12297829382473034410, byte12: 12297829382473034410, byte13: 12297829382473034410, byte14: 12297829382473034410, byte15: 12297829382473034410&amp;gt;
irb(main):003:0&amp;gt; f.destroy
=&amp;gt; #&amp;lt;Fingerprint id: 112, byte0: 12297829382473034410, byte1: 12297829382473034410, byte2: 12297829382473034410, byte3: 12297829382473034410, byte4: 12297829382473034410, byte5: 12297829382473034410, byte6: 12297829382473034410, byte7: 12297829382473034410, byte8: 12297829382473034410, byte9: 12297829382473034410, byte10: 12297829382473034410, byte11: 12297829382473034410, byte12: 12297829382473034410, byte13: 12297829382473034410, byte14: 12297829382473034410, byte15: 12297829382473034410&amp;gt;
irb(main):004:0&amp;gt; Fingerprint.count
=&amp;gt; 1
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Active Record and the Fingerprint API&lt;/h4&gt;

&lt;p&gt;The &lt;tt&gt;Fingerprint&lt;/tt&gt; class is so concise because it takes advantage of the Ruby library called &lt;a href="http://ar.rubyonrails.com/"&gt;ActiveRecord&lt;/a&gt;. ActiveRecord is the object-relational mapping system used in &lt;a href="http://rubyonrails.com/"&gt;Ruby on Rails&lt;/a&gt;. ActiveRecord can be used outside of Rails, as was done for this library, by including the code at the top of the file beginning with "ActiveRecord::Base.establish_connection...", where you'd use the parameters specific to your database.&lt;/p&gt;

&lt;p&gt;We gain three key advantages with this approach: (1) we have very little SQL to code; (2) we have access to all of ActiveRecord's built-in CRUD operations such as counting records through Fingerprint.count and deleting &lt;tt&gt;Fingerprints&lt;/tt&gt; with &lt;tt&gt;destroy&lt;/tt&gt; without writing anything ourselves; and (3) we can easily integrate the &lt;tt&gt;Fingerprint&lt;/tt&gt; class into any Ruby on Rails application.&lt;/p&gt;

&lt;h4&gt;Variations&lt;/h4&gt;

&lt;p&gt;At least two other Object-Ralational Mapping systems could be used from Ruby, &lt;a href="http://datamapper.org/"&gt;DataMapper&lt;/a&gt;, and &lt;a href="http://sequel.rubyforge.org/"&gt;Sequel&lt;/a&gt;. The approach described here could be adapted to these other ORMS with minimal effort.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;We now have a working fingerprint screening system built solely from open source components. MySQL houses the data and provides for highly-optimized queries. A concise Ruby API created with ActiveRecord now allows us to deal with our fingerprint database as a collection of objects in a high-level language. We can perform all CRUD operations without writing a line of SQL.&lt;/p&gt;

&lt;p&gt;We've come a long way, but we're still not dealing with molecules. We previously saw how Open Babel can &lt;a href="http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql"&gt;generate fingerprints&lt;/a&gt; with which we could, in principle, populate and query our database. The next article in this series will use this capability in creating a more chemically-aware system.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/58595467@N00/"&gt;peerlingguy&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Mon, 06 Oct 2008 22:13:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:8967374d-cc5e-4c9a-a345-c20e38100fc2</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby</link>
      <category>Tools</category>
      <category>mysql</category>
      <category>sql</category>
      <category>ruby</category>
      <category>rails</category>
      <category>activerecord</category>
      <category>orm</category>
      <category>fingerprint</category>
      <category>substructuresearch</category>
      <category>database</category>
    </item>
    <item>
      <title>Fast Substructure Search Using Open Source Tools Part 2: Fingerprint Screen With SQL</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/dopeylok/14052025/"&gt;&lt;img src="http://depth-first.com/demo/20081003/skeleton.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The &lt;a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases"&gt;previous article in this series&lt;/a&gt; discussed the configuration of a MySQL database for fast substructure search with binary fingerprints. This article first shows how to populate this database with real fingerprint data for two molecules. Then it shows how to formulate standard SQL queries to screen the database for substructures.&lt;/p&gt;

&lt;p&gt;All articles in this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases"&gt;Part 1: Fingerprints and Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: Fingerprint Screen With SQL&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;Part 3: A CRUD API for Fingerprints in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/15/fast-substructure-search-using-open-source-tools-part-4-creating-fingerprints-from-chemical-structures"&gt;Part 4: Creating Fingerprints from Chemical Structures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/21/fast-substructure-search-using-open-source-tools-part-5-relating-molecules-to-fingerprints-with-sql"&gt;Part 5: Relating Molecules to Fingerprints with SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/29/fast-substructure-search-using-open-source-tools-part-6-modelling-a-one-to-many-relationship-between-fingerprints-and-compounds-in-ruby"&gt;Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Creating the Fingerprints with Open Babel&lt;/h4&gt;

&lt;p&gt;The &lt;tt&gt;babel&lt;/tt&gt; command line utility will, among it many conversions, return a fingerprint when given a valid SMILES string.  For example, we can create the fingerprint for benzene like this:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ babel -ismi -ofpt
c1ccccc1
&gt;   6 bits set 
00000000 00000000 00000000 00000200 00000000 00000000 
00000000 00000000 00000000 00000840 00000000 00008000 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 08000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00020000 
00000000 00000000 
1 molecule converted
12 audit log messages
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Similarly, we create the fingerprint for phenol like this:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ babel -ismi -ofpt
c1ccccc1O 
&gt;   12 bits set 
00000000 00000008 20000000 00000200 00000000 00000000 
02000000 00000000 00000000 00000840 00000000 00008000 
00000002 00000000 00000000 00000008 00000000 00000000 
00000000 00020000 00000000 08000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00020000 
00000000 00000000 
1 molecule converted
19 audit log messages
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The exact meaning of these fingerprints is interesting, but not relevant. Without getting into the details of the Open Babel fingerprint formats, which are discussed in detail &lt;a href="http://www.dalkescientific.com/writings/diary/archive/2008/06/27/generating_fingerprints_with_openbabel.html"&gt;elsewhere&lt;/a&gt;, the output contains the binary fingerprint of each molecule as an array of 32-bit hexadecimal numbers.&lt;/p&gt;

&lt;h4&gt;Adding Fingerprints to the Database&lt;/h4&gt;

&lt;p&gt;To use Open Babel's fingerprints with out database, we need to convert the 32-bit hexadecimal numerical output to 64-bit decimal format. This is not difficult and most programming environments make this very simple. For example, the following Ruby code will convert the third and fourth 32-bit hexadecimal numbers in the benzene fingerprint into a 64-bit decimal number:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; "0000000000000200".hex
=&gt; 512
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Performing this conversion for every pair of 32-bit hex numbers in each fingerprint gives a set of numbers we can place directly into our database:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; # Add benzene decimal fingerprint.
mysql&gt; insert into fingerprints
(fp0,fp1,fp2,fp3,fp4,fp5,fp6,fp7,fp8,fp9,fp10,fp11,fp12,fp13,fp14,fp15)
values
(0, 512, 0, 0, 2112, 32768, 0, 0, 0, 0, 134217728, 0, 0, 0, 131072, 0);
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Similarly,&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; # Add phenol decimal fingerprint.
mysql&gt; insert into fingerprints
(fp0,fp1,fp2,fp3,fp4,fp5,fp6,fp7,fp8,fp9,fp10,fp11,fp12,fp13,fp14,fp15)
values
(8, 2305843009213694464, 0, 144115188075855872, 2112, 32768, 8589934592, 8, 0, 131072, 134217728, 0, 0, 0, 131072, 0);
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Our table is now ready to be queried:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; select * from fingerprints;
+----+------+---------------------+------+--------------------+------+-------+------------+------+------+--------+-----------+------+------+------+--------+------+
| id | fp0  | fp1                 | fp2  | fp3                | fp4  | fp5   | fp6        | fp7  | fp8  | fp9    | fp10      | fp11 | fp12 | fp13 | fp14   | fp15 |


+----+------+---------------------+------+--------------------+------+-------+------------+------+------+--------+-----------+------+------+------+--------+------+
|  1 |    0 |                 512 |    0 |                  0 | 2112 | 32768 |          0 |    0 |    0 |      0 | 134217728 |    0 |    0 |    0 | 131072 |    0 | 
|  2 |    8 | 2305843009213694464 |    0 | 144115188075855872 | 2112 | 32768 | 8589934592 |    8 |    0 | 131072 | 134217728 |    0 |    0 |    0 | 131072 |    0 | 
+----+------+---------------------+------+--------------------+------+-------+------------+------+------+--------+-----------+------+------+------+--------+------+
2 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Querying the Database&lt;/h4&gt;

&lt;p&gt;With a table of fingerprints in hand, we can begin formulating queries. To do so, we'll use MySQL's built-in support for &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/bit-functions.html"&gt;binary arithmetic&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A molecule with fingerprint A can represent a substructure of another molecule with fingerprint B if all of the bits in B are also present in A. Mathematically, we'd say that:&lt;/p&gt;

&lt;p&gt;B&lt;sub&gt;i&lt;/sub&gt; &amp;amp; A&lt;sub&gt;i&lt;/sub&gt; = B&lt;sub&gt;i&lt;/sub&gt;&lt;/p&gt;

&lt;p&gt;for all bits i in A and B.&lt;/p&gt;

&lt;p&gt;Let's say we have a two-bit fingerprint consisting of 01 and 11 (binary) in our database. We can use MySQL to test whether the molecule from which the second fingerprint was derived could be a substructure of the molecule from which the first fingerprint was derived with this syntax:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; select 1&amp;3;
+-----+
| 1&amp;3 |
+-----+
|   1 | 
+-----+
1 row in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The answer is yes, there could be a substructure match because 1&amp;amp;3 = 1.&lt;/p&gt;

&lt;p&gt;We're now ready to perform our first substructure screen using SQL. This consists of selecting all rows for which each of the 16 fingerprint components, when anded together with a query fingerprint component, gives back the original component.&lt;/p&gt;

&lt;p&gt;To see if phenol is a substructure of benzene, we could use the following:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; select id from fingerprints where fp0&amp;0=0 and fp1&amp;512=512 and fp2&amp;0=0 and fp3&amp;0=0 and fp4&amp;2112=2112 and fp5&amp;32768=32768 and fp6&amp;0=0 and fp7&amp;0=0 and fp8&amp;0=0 and fp9&amp;0=0 and fp10&amp;134217728=134217728 and fp11&amp;0=0 and fp12&amp;0=0 and fp13&amp;0=0 and fp14&amp;131072=131072 and fp15&amp;0=0;
+----+
| id |
+----+
|  1 | 
|  2 | 
+----+
2 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Our query results are telling us that phenol is both a substructure of benzene and itself, as expected.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;We now have a database populated with two molecules represented as fingerprints. We can even scan the database for possible substructure matches using nothing more than standard SQL queries. Nevertheless, we've had to use a lot of manual coding to convert hex into decimal and create SQL. We need a library to do this mundane work for us. The next article in this series will discuss a better approach using Ruby.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/dopeylok/"&gt;dopeylok&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Fri, 03 Oct 2008 14:47:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:fb61a84a-c88d-4ec7-9b0b-642f45092eb4</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql</link>
      <category>Tools</category>
      <category>openbabel</category>
      <category>mysql</category>
      <category>ruby</category>
      <category>fingerprint</category>
      <category>substructuresearch</category>
      <category>query</category>
    </item>
    <item>
      <title>Fast Substructure Search Using Open Source Tools Part 1: Fingerprints and Databases</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/jaded/89717778/"&gt;&lt;img src="http://depth-first.com/demo/20081002/fingerprint.jpg" align="right"&gt;&lt;/img&gt;&lt;/a&gt;For anyone working in a chemistry-related job, chemical databases are ubiquitous. A printed list of IUPAC names, a spreadsheet containing &lt;a href="http://depth-first.com/articles/2008/05/26/simple-cas-number-lookup-and-more-with-chempedia"&gt;CAS numbers&lt;/a&gt;, and a set of hand-drawn structures on index cards are all primitive chemical databases. They aren't nearly as useful as they could be to either the creator or his/her collaborators, but they are databases nevertheless. Anyone who has spent time in industry or academics knows that these low-tech chemical databases are everywhere. And they become more of a problem as more information is moved into electronic format.&lt;/p&gt;

&lt;p&gt;All articles in this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1: Fingerprints and Databases&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/03/fast-substructure-search-using-open-source-tools-part-2-fingerprint-screen-with-sql"&gt;Part 2: Fingerprint Screen With SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/06/fast-substructure-search-using-open-source-tools-part-3-a-crud-api-for-fingerprints-in-ruby"&gt;Part 3: A CRUD API for Fingerprints in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/15/fast-substructure-search-using-open-source-tools-part-4-creating-fingerprints-from-chemical-structures"&gt;Part 4: Creating Fingerprints from Chemical Structures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/21/fast-substructure-search-using-open-source-tools-part-5-relating-molecules-to-fingerprints-with-sql"&gt;Part 5: Relating Molecules to Fingerprints with SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://depth-first.com/articles/2008/10/29/fast-substructure-search-using-open-source-tools-part-6-modelling-a-one-to-many-relationship-between-fingerprints-and-compounds-in-ruby"&gt;Part 6: Modelling a One-To-Many Relationship Between Fingerprints and Compounds in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;The Problem: Structure Search is Hard&lt;/h4&gt;

&lt;p&gt;Many of the low-tech chemical databases that professional chemists routinely share and work with would become orders of magnitude more useful if they were converted into substructure-searchable databases and published to the Web. Although there has been a &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;great deal of effort toward this end&lt;/a&gt; in the last few years, there's still much, much more that could be done.&lt;/p&gt;

&lt;p&gt;One of the main problems in creating a substructure-searchable chemical database is implementing the substructure search capability itself. This one requirement has done more to stifle the free flow of chemical information than perhaps any other. Solving the problem appears very difficult on first or second glance, and it is very difficult if you don't have the right tools. Many companies offer solutions - but at a price, both in terms of money and time, that is simply out of reach.&lt;/p&gt;

&lt;p&gt;What can you do if you're just getting started with modest requirements and budget?&lt;/p&gt;

&lt;h4&gt;About This Series&lt;/h4&gt;

&lt;p&gt;This article, the first in a series, will describe the creation of a chemical substructure search engine using exclusively well-maintained and robust open source tools: &lt;a href="http://openbabel.org"&gt;Open Babel&lt;/a&gt; for generating fingerprints and peforming atom-by-atom searches; &lt;a href="http://mysql.com"&gt;MySQL&lt;/a&gt; as a relational database; and &lt;a href="http://ruby-lang.org"&gt;Ruby&lt;/a&gt; as a scripting language.&lt;/p&gt;

&lt;p&gt;Each of these three components is a commodity that can be replaced with any one of a number of open-source or proprietary substitutes, maximizing flexibility and minimizing vendor lock-in.&lt;/p&gt;

&lt;h4&gt;Other Resources&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://merian.pch.univie.ac.at/pch/nh_info.html"&gt;Norbert Haider&lt;/a&gt; of the University of Vienna has written a very useful tutorial on &lt;a href="http://merian.pch.univie.ac.at/~nhaider/cheminf/moldb.html"&gt;creating a structure-searchable database using free tools&lt;/a&gt;, which is part of a &lt;a href="http://depth-first.com/articles/2007/04/13/roll-your-own-chemical-database-with-free-components"&gt;larger series&lt;/a&gt;. That series differs from this one in the technology stack used and the level of detail to be provided. The series of articles to appear here will spell out the low-level series of steps needed to create a working substructure search system. It's hoped that taking this perspective makes clear the steps needed to apply the approach to alternative technology platforms.&lt;/p&gt;

&lt;h4&gt;Binary Fingerprints and Relational Databases&lt;/h4&gt;

&lt;p&gt;At the heart of the system we'll build is the chemical fingerprint which is a (usually) lossy binary representation of a chemical structure. Creating a binary fingerprint is like putting every chemical structure, known or unknown into just one bin out of a very large, but finite set of bins. Although the same molecule is guaranteed to always go into the same bin, more than one molecule can be placed into each bin. This is a general feature of all &lt;a href="http://en.wikipedia.org/wiki/Hash_function"&gt;hashing&lt;/a&gt; schemes.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.dalkescientific.com/index.html"&gt;Andrew Dalke&lt;/a&gt; has written &lt;a href="http://www.dalkescientific.com/writings/diary/archive/2008/06/26/fingerprint_background.html"&gt;an excellent series of articles&lt;/a&gt; on fingerprints and what can be done with them. Another good overview is &lt;a href="http://www.daylight.com/dayhtml/doc/theory/theory.finger.html"&gt;available from Daylight&lt;/a&gt;. This article will assume you know what fingerprints are and how they can be used to compare chemical structures.&lt;/p&gt;

&lt;p&gt;The problem with binary fingerprints is that they are generally several hundred bits long - too long to be represented in a form that allows direct and rapid query by a relational database system. They need to be broken up - but how?&lt;/p&gt;

&lt;p&gt;A widely-used approach (and the one that will be taken here) involves breaking up the fingerprint into a series of integers that are stored in the database.&lt;/p&gt;

&lt;p&gt;For example, let's say we have a 1024-bit fingerprint. We could represent this as a number from 0 to 2^1024, which of course is way to big for most computers to handle today. We could, however, represent this fingerprint as a series of sixteen 64-bit integers (which are available on most systems).&lt;/p&gt;

&lt;p&gt;So, the binary fingerprint:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
1111111101111111110110111011011000101000011000011010011100010000
1001100010101101000110100010110011101100100000100100000111010100
0101010000101011001010011001000100011001100000101100111010001110
1001000101001010000001011001100101101011111111011000111100000111
1010101100100101000100001100011001010111001001110101101100010010
0011101011101110110011111010000010111001100101001001101010110001
1100111000010100000100110111101001011100010111010001010101101101
0010001111111010111011110110000000001010111011111001111001111101
0101011100011111110111011110011110100110010110010101011001011111
0110100001111001101111011101001101101001000100010001100101111000
0011111001000100001111111110001100111001101000000100010010010110
0000011101001001011000111110101110010101110001111010100001100100
0100100111101010110101101010110110101010110110111011011001111111
0011100100101101101001000001000111110101011101110101101001101001
0110100100111001111001001111110111111001110100100110010100011110
0010101100101000011110101110111011001110101111100001011010101100
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;could also be represented as this decimal fingerprint (assuming your machine is &lt;a href="http://en.wikipedia.org/wiki/Endianness"&gt;big-endian&lt;/a&gt;):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
18410675377121896208
11001478244984832468
6064987026359504526
10469186440276053767
12332281598675737362
4246559787872197297
14849515287603909997
2592647731284516477
6277980392575817311
7528256967824972152
4486781373924787350
525060695046727780
5326305550703244927
4120129631153511017
7582343227124114718
3109870708788696748
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We can easily store this set of 16 numbers in a relational database table. For example, if we had a MySQL database called "compounds", we could create a "fingerprints" table:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
mysql&gt; create database compounds;
Query OK, 1 row affected (0.02 sec)

mysql&gt; use compounds;

Database changed
mysql&gt; create table fingerprints(id int not null auto_increment, primary key(id), fp0 bigint(64), fp1 bigint(64), fp2 bigint(64), fp3 bigint(64), fp4 bigint(64), fp5 bigint(64), fp6 bigint(64), fp7 bigint(64), fp8 bigint(64), fp9 bigint(64), fp10 bigint(64), fp11 bigint(64), fp12 bigint(64), fp13 bigint(64), fp14 bigint(64), fp15 bigint(64));
Query OK, 0 rows affected (0.01 sec)

mysql&gt; describe fingerprints;
+-------+------------+------+-----+---------+----------------+
| Field | Type       | Null | Key | Default | Extra          |
+-------+------------+------+-----+---------+----------------+
| id    | int(11)    | NO   | PRI | NULL    | auto_increment | 
| fp0   | bigint(64) | YES  |     | NULL    |                | 
| fp1   | bigint(64) | YES  |     | NULL    |                | 
| fp2   | bigint(64) | YES  |     | NULL    |                | 
| fp3   | bigint(64) | YES  |     | NULL    |                | 
| fp4   | bigint(64) | YES  |     | NULL    |                | 
| fp5   | bigint(64) | YES  |     | NULL    |                | 
| fp6   | bigint(64) | YES  |     | NULL    |                | 
| fp7   | bigint(64) | YES  |     | NULL    |                | 
| fp8   | bigint(64) | YES  |     | NULL    |                | 
| fp9   | bigint(64) | YES  |     | NULL    |                | 
| fp10  | bigint(64) | YES  |     | NULL    |                | 
| fp11  | bigint(64) | YES  |     | NULL    |                | 
| fp12  | bigint(64) | YES  |     | NULL    |                | 
| fp13  | bigint(64) | YES  |     | NULL    |                | 
| fp14  | bigint(64) | YES  |     | NULL    |                | 
| fp15  | bigint(64) | YES  |     | NULL    |                | 
+-------+------------+------+-----+---------+----------------+
17 rows in set (0.01 sec)

&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Although we have neither a substructure search engine nor a database, we've laid a solid foundation for those things. The next article in this series will show how to use this humble beginning to model some simple substructure queries in a way that lets MySQL do most of the heavy-lifting.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Credit: &lt;a href="http://flickr.com/photos/jaded/"&gt;Mr. Jaded&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Thu, 02 Oct 2008 23:27:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:1d0f8fdc-4664-46ac-89cc-7c1de0608edd</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases</link>
      <category>Tools</category>
      <category>mysql</category>
      <category>ruby</category>
      <category>openbabel</category>
      <category>database</category>
      <category>fingerprint</category>
      <category>substructuresearch</category>
      <category>substructure</category>
    </item>
    <item>
      <title>Scripting Molecular Fingerprints with Ruby CDK</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right"&gt;&lt;/img&gt;A &lt;a href="http://www.daylight.com/dayhtml/doc/theory/theory.finger.html"&gt;molecular fingerprint&lt;/a&gt; represents a molecule as series of bits. There are many situations in which this reduced form of molecular representation is useful. For example, fingerprints are frequently used as a fast prescreen for database substructure searches. They can also be used for "fuzzy" comparisons involving molecular similarity, a nice complement to binary queries such as substructure search.&lt;/p&gt;

&lt;p&gt;Fingerprints have their limitations. Being a form of hashing, they are imprecise in that two different molecules can have exactly the same fingerprint. The converse is also true: many molecular fingerprints exaggerate small differences between two molecules that most chemists would say are similar - for example between oxygen and sulfur analogs of the same structure.&lt;/p&gt;

&lt;p&gt;Despite their limitations, the advantages of fingerprints make them useful in many situations. As a result, numerous fingerprinting systems have become popular. This tutorial will focus on creating and manipulating molecular fingerprints from Ruby using the Ruby Chemistry Development Kit (RCDK).&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;For this tutorial, you'll need &lt;a href="http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0"&gt;Ruby CDK&lt;/a&gt; (RCDK). A recent article described the small amount of system configuration required for &lt;a href="http://depth-first.com/articles/2006/09/25/cdk-the-ruby-way-rcdk-0-2-0"&gt;RCDK on Linux&lt;/a&gt;. Another article showed how to install &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;RCDK on Windows&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;A Small Fingerprint Library&lt;/h4&gt;

&lt;p&gt;Let's build a small Ruby library for working with fingerprints. Place the following code into a file called &lt;strong&gt;fingerprint.rb&lt;/strong&gt; in your working directory:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk/util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;jrequire&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.fingerprint.Fingerprinter&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;jrequire&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.similarity.Tanimoto&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="comment"&gt;# Molecule fingerprinting&lt;/span&gt;
&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Fingerprinter&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@fingerprinter&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Org&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Openscience&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Cdk&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Fingerprint&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Fingerprinter&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Lang&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read_smiles&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;

    &lt;span class="ident"&gt;fp&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@fingerprinter&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getFingerprint&lt;/span&gt; &lt;span class="ident"&gt;mol&lt;/span&gt;

    &lt;span class="comment"&gt;# Metaprogramming!&lt;/span&gt;
    &lt;span class="ident"&gt;fp&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;extend&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;Fingerprint&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;

&lt;span class="comment"&gt;# BitSet comparison&lt;/span&gt;
&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;Fingerprint&lt;/span&gt;
  &lt;span class="comment"&gt;# Returns true of all of the bits set to true in this fingerprint are also set to true in the specified fingerprint&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;subset?&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="constant"&gt;Org&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Openscience&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Cdk&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Fingerprint&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Fingerprinter&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;isSubset&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;self&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="comment"&gt;# Tanimoto similarity of this fingerprint and the specified fingerprint&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;tanimoto&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="constant"&gt;Org&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Openscience&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Cdk&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Similarity&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Tanimoto&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;calculate&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;self&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Of particular note is the use of Ruby's &lt;tt&gt;Object.extend&lt;/tt&gt; method. This method allows a single instance of an object to be extended at runtime - a form of &lt;a href="http://depth-first.com/articles/2006/10/24/metaprogramming-with-ruby-mapping-java-packages-onto-ruby-modules"&gt;metaprogramming&lt;/a&gt;. In this case, we add the &lt;tt&gt;subset?&lt;/tt&gt; and &lt;tt&gt;tanimoto&lt;/tt&gt; methods for determining whether all of the bits in one fingerprint are present in another, and for determining similarity, respectively. We use this technique here because currently RJB doesn't provide the complete interface into Java classes that would be required to create a Ruby class that directly inherits from Java's BitSet class.&lt;/p&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20061122/loratadine.png"&gt;&lt;/img&gt;&lt;img src="http://depth-first.com/demo/20061122/desloratadine.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=3957"&gt;Claritin&lt;/a&gt; (loratadine, left) and &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=124087"&gt;Clarinex&lt;/a&gt; (desloratadine, right) are two structurally-related antihistamines. Can we quantitate the degree of similarity between these two structures? Fingerprints provide one way. The following code creates fingerprints for the two structures, determines if one is the subset of another, and assigns a Tanimoto similarity value:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;fingerprint&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;f&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Fingerprinter&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

&lt;span class="ident"&gt;loratadine&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;f&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;fingerprint&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;CCOC(=O)N1CCC(=C2C3=C(CCC4=C2N=CC=C4)C=C(C=C3)Cl)CC1&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;desloratadine&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;f&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;fingerprint&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;C1CC2=C(C=CC(=C2)Cl)C(=C3CCNCC3)C4=C1C=CC=N4&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Loratadine is a subset of desloratadine: &lt;span class="expr"&gt;#{loratadine.subset? desloratadine}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; false&lt;/span&gt;
&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Desloratadine is a subset of loratadine: &lt;span class="expr"&gt;#{desloratadine.subset? loratadine}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; true&lt;/span&gt;
&lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Tanimoto similarity of desloratadine and loratadine: &lt;span class="expr"&gt;#{loratadine.tanimoto desloratadine}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; 0.895683467388153&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Variations&lt;/h4&gt;

&lt;p&gt;CDK's &lt;tt&gt;&lt;a href="http://cdk.sourceforge.net/api/org/openscience/cdk/fingerprint/Fingerprinter.html"&gt;Fingerprinter&lt;/a&gt;&lt;/tt&gt; class returns an instance of the Java class &lt;tt&gt;&lt;a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/BitSet.html"&gt;BitSet&lt;/a&gt;&lt;/tt&gt;. This &lt;tt&gt;BitSet&lt;/tt&gt; can be further manipulated in Ruby. For example, to find the size (the total number of bits) of the &lt;tt&gt;BitSet&lt;/tt&gt;, we could use:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;loratadine&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;size&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; 1024&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Similarly, to find the number of bits set to true, we would use:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;loratadine&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;cardinality&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; 278&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To print out a list of all bits set to true, we could use the &lt;tt&gt;toString&lt;/tt&gt; method:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;loratadine&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;toString&lt;/span&gt; &lt;span class="comment"&gt;# =&amp;gt; &amp;quot;{2, 8, 11, 16, 18, 22, 32, 37, 38, 41, 42, 46, 47, 51, 57, 64, 65, 66, 69 ... }&amp;quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Fingerprints enable many useful and fast comparisons between molecules. The form of fingerprint we've used here is but one of possibilities offered by CDK. The next article in this series will discuss fingerprints in &lt;a href="http://openbabel.sourceforge.net/wiki/Fingerprint"&gt;Open Babel&lt;/a&gt; using both Ruby and Python.&lt;/p&gt;</description>
      <pubDate>Wed, 22 Nov 2006 15:44:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:b5807052-d051-4121-b89f-1d8cc908ef4f</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/11/22/scripting-molecular-fingerprints-with-ruby-cdk</link>
      <category>Tools</category>
      <category>fingerprint</category>
      <category>bitset</category>
      <category>similarity</category>
      <category>rcdk</category>
      <category>ruby</category>
    </item>
  </channel>
</rss>
