<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag cdk</title>
    <link>http://depth-first.com/articles/tag/cdk</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>CampDepict: Building a Simple SMILES Depict Web Application With JRuby, Structure CDK, and Camping</title>
      <description>&lt;p&gt;&lt;a href="http://redhanded.hobix.com/bits/campingAMicroframework.html"&gt;&lt;img src="http://depth-first.com/demo/20080423/camping.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Today's tribute to the power of simplicity comes by way of &lt;a href="http://goeslightly.blogspot.com/"&gt;John Jaeger&lt;/a&gt;, who has built one of the simplest cheminformatics Web applications ever written. His creation, &lt;a href="http://goeslightly.blogspot.com/2008/04/campdepict-jruby-cdk-and-camping.html"&gt;CampDepict&lt;/a&gt;, interactively produces a raster image of a 2D chemical structure given a SMILES string, not unlike &lt;a href="http://www.daylight.com/daycgi/depict"&gt;Daylight's Depict application&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;CampDepict uses the Ruby Web microframework &lt;a href="http://redhanded.hobix.com/bits/campingAMicroframework.html"&gt;Camping&lt;/a&gt;. From the &lt;a href="http://camping.rubyforge.org/files/README.html"&gt;README&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;Camping is a web framework which consistently stays at less than 4kb of code. You can probably view the complete source code on a single page. But, you know, it&#8216;s so small that, if you think about it, what can it really do?&lt;/p&gt;
    
    &lt;p&gt;The idea here is to store a complete fledgling web application in a single file like many small CGIs. But to organize it as a Model-View-Controller application like Rails does. You can then easily move it to Rails once you&#8216;ve got it going.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;John's application is loosely-based on the &lt;a href="http://depth-first.com/articles/2006/12/04/anatomy-of-a-cheminformatics-web-application-ajaxifying-depict"&gt;Rails Depict&lt;/a&gt; application first described in 2006 here on Depth-First. His code makes use of &lt;a href="http://cdk.sf.net"&gt;CDK&lt;/a&gt; and &lt;a href="http://sf.net/projects/structure"&gt;Structure CDK&lt;/a&gt;, and it runs on &lt;a href="http://jruby.codehaus.org/"&gt;JRuby&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you've ever been curious about what Ruby has to offer cheminformatics, CampDepict could be just the application to get your feet wet.&lt;/p&gt;</description>
      <pubDate>Wed, 23 Apr 2008 11:16:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:b831ffb0-cb0a-46ed-aaa1-a5cddc2acfcf</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/04/23/campdepict-building-a-simple-smiles-depict-web-application-with-jruby-structure-cdk-and-camping</link>
      <category>Tools</category>
      <category>camping</category>
      <category>ruby</category>
      <category>jruby</category>
      <category>campdepict</category>
      <category>structurecdk</category>
      <category>cdk</category>
      <category>webapplication</category>
    </item>
    <item>
      <title>Five Open Tools for 2D Structure Layout (aka Structure Diagram Generation)</title>
      <description>&lt;p&gt;&lt;a href="http://metamolecular.com/chemwriter"&gt;&lt;img src="http://depth-first.com/demo/20070411/difficult.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Given a molecular representation without 2D coordinates, how would you display a human-readable view?&lt;/p&gt;

&lt;p&gt;This problem can arise in many situations, one of the most common of which is the parsing of &lt;a href="http://depth-first.com/articles/tag/linenotation"&gt;line notations&lt;/a&gt; such as &lt;a href="http://depth-first.com/articles/2007/10/19/easily-convert-iupac-nomenclature-to-smiles-inchi-or-molfile-with-rubidium"&gt;IUPAC nomenclature&lt;/a&gt;, SMILES, or &lt;a href="http://depth-first.com/articles/2007/10/15/an-introduction-to-the-rubidium-cheminforamtics-toolkit-interconvert-smiles-inchi-and-molfile-with-an-open-babel-like-interface"&gt;InChI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And then there are the cases when you have 2D coordinates, but they're &lt;a href="http://depth-first.com/articles/2008/02/12/the-art-and-science-of-chemical-structure-diagrams-double-trouble"&gt;not very aesthetically pleasing&lt;/a&gt;. Maybe the coordinates were created by people either in a hurry or working with low quality editors, or maybe they were generated as distorted 2D projections of 3D coordinates. Whatever the reason, simply having 2D coordinates may not be the same as having &lt;em&gt;good&lt;/em&gt; 2D coordinates.&lt;/p&gt;

&lt;p&gt;Last year, a Depth-First article &lt;a href="http://depth-first.com/articles/2007/04/11/structure-diagram-generation"&gt;discussed the Structure Diagram Generation (SDG) problem&lt;/a&gt; and how it can be solved with Open Source software. Given that nearly a year has passed, it seemed appropriate to revisit the topic.&lt;/p&gt;

&lt;p&gt;The good news is that there are at least four independent Open Source implementations of SDG algorithms, and one potential open database approach. They are, in no particular order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://sourceforge.net/projects/mcdl"&gt;MCDL&lt;/a&gt; Written in Java, the emphasis of this software appears to be facilitating the use of &lt;a href="http://depth-first.com/articles/2006/08/19/a-first-look-at-modular-chemical-descriptor-language-mcdl"&gt;Modular Chemical Descriptor Language&lt;/a&gt;. Unfortunately, no new releases of this intriguing software package have been made in the last year.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://sf.net/projects/cdk"&gt;Chemistry Development Kit (CDK)&lt;/a&gt; This useful package handles about 70-80% of a typical assortment of chemical structures well. The large amount of activity on the CDK project in general makes this a particularly good SDG system to contribute to, especially in the areas of refactoring and handling special cases. See also &lt;a href="http://www.steinbeck-molecular.de/steinblog/index.php/2007/08/14/structure-diagram-generation-sdg-2d-layout-in-the-chemistry-development-kit-part-1/"&gt;Christoph Steinbeck's overview of CDK's layout system&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://bkchem.zirael.org/"&gt;BKChem&lt;/a&gt; A 2D structure editor written in Python. Give it an InChI and it will display the structure, courtesy of SDG. The system worked remarkably well with the molecules I tested. BKChem has also been reported to work in &lt;a href="http://bkchem.zirael.org/batch_mode_en.html"&gt;batch mode&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.rdkit.org/"&gt;RDKit&lt;/a&gt; Written in Python and C++, this package is the newest of the bunch. Although &lt;a href="http://sourceforge.net/mailarchive/message.php?msg_id=360844.35824.qm%40web34206.mail.mud.yahoo.com"&gt;I haven't had much luck compiling RDKit&lt;/a&gt;, it still looks quite promising. Any chance of switching to &lt;a href="http://www.gnu.org/software/make/"&gt;make&lt;/a&gt; as a build system?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; PubChem? Maybe. With a database of small molecules now numbering well over ten million, there's a good chance that the molecule for which you need to assign coordinates is already in PubChem. And if it's in PubChem, 2D coordinates have already been assigned. Use an InChI as a hash key, and voila - instant SDG without much software. Given the novelty of large, publicly-available databases of small molecules such as PubChem, this approach may have a great deal of untapped potential.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SDG is one of those issues that can stay off the radar for some only to become an instant, nagging problem with no clear way out. The tools cited here offer an excellent place to begin working toward a comprehensive solution.&lt;/p&gt;</description>
      <pubDate>Wed, 26 Mar 2008 09:11:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:5441d3fc-3dc2-4f2d-b740-5cad16dd454b</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/03/26/five-open-tools-for-2d-structure-layout-aka-structure-diagram-generation</link>
      <category>Tools</category>
      <category>sdg</category>
      <category>2d</category>
      <category>mcdl</category>
      <category>cdk</category>
      <category>bkchem</category>
      <category>rdkit</category>
      <category>pubchem</category>
      <category>coordinates</category>
      <category>java</category>
      <category>python</category>
      <category>cplusplus</category>
      <category>layout</category>
    </item>
    <item>
      <title>Simple Installation of Rubidium</title>
      <description>&lt;p&gt;&lt;a href="http://rbtk.rubyforge.org/"&gt;&lt;img src="http://depth-first.com/demo/20071015/rubidium.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://rbtk.rubyforge.org/"&gt;Rubidium&lt;/a&gt; is a Ruby cheminformatics scripting environment. Previously, &lt;a href="http://depth-first.com/articles/2007/11/12/parsing-sd-files-with-ruby-and-rubidium"&gt;a problem&lt;/a&gt; was reported with the RubyForge gem repository that prevented the simple installation of the Rubidium gem. After filing a &lt;a href="http://rubyforge.org/tracker/index.php?func=detail&amp;amp;aid=15665&amp;amp;group_id=5&amp;amp;atid=101"&gt;bug report&lt;/a&gt;, the problem was resolved.&lt;/p&gt;

&lt;p&gt;The problem, which led to a 404 being issued when trying to install the gem from the remote RubyGems repository, was a variant of a &lt;a href="http://rubyforge.org/tracker/index.php?func=detail&amp;amp;aid=15417&amp;amp;group_id=5&amp;amp;atid=102"&gt;known RubyForge issue&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can now install Rubidium like this:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jruby -S gem install rbtk
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Installation takes a few minutes due to the large size of the included &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; jarfile.&lt;/p&gt;</description>
      <pubDate>Wed, 21 Nov 2007 09:26:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:f0a52354-e6aa-4a02-b329-4b5271486940</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/21/simple-installation-of-rubidium</link>
      <category>Tools</category>
      <category>rubidium</category>
      <category>ruby</category>
      <category>jruby</category>
      <category>cdk</category>
    </item>
    <item>
      <title>An Introduction to the Rubidium Cheminforamtics Toolkit: Interconvert SMILES, InChI, and Molfile with an Open Babel-Like Interface</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20071015/rubidium.png" align="right"&gt;&lt;/img&gt;Interconverting molecular languages is a very common operation in cheminformatics, so convenient conversion tools are desirable. Recent articles have discussed JRuby as a &lt;a href="http://depth-first.com/articles/tag/ruby"&gt;functional cheminformatics scripting environement&lt;/a&gt;. In this article, we'll see how this functionality can be combined with convenience for molecular language conversions.&lt;/p&gt;

&lt;p&gt;In addition to illustrating a technique, this article is the first in a series aimed at documenting a new cheminformatics toolkit for Ruby called "Rubidium". Rubidium will provide a unified set of Ruby APIs for working with diverse Open Source cheminformatics tools.&lt;/p&gt;

&lt;p&gt;Rubidium will be distributed under the highly permissive &lt;a href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT License&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;This Rubidium library requires &lt;a href="http://jruby.codehaus.org/"&gt;JRuby&lt;/a&gt; and the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK). Copying the &lt;a href="http://downloads.sourceforge.net/cdk/cdk-1.0.1.jar?modtime=1182877138&amp;amp;big_mirror=0"&gt;CDK jarfile&lt;/a&gt; into your JRuby &lt;tt&gt;lib&lt;/tt&gt; directory is all that's needed.&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;The goal of this library is to provide a simple, yet flexible way to interconvert SMILES, InChI, and molfile formats. It was inspired the &lt;a href="http://openbabel.sf.net"&gt;Open Babel&lt;/a&gt; library, in which an &lt;tt&gt;OBConversion&lt;/tt&gt; object is configured with input and output formats prior to performing one or more conversions. In today's library, a similar Ruby interface is created for the CDK. Because of it's length, it won't be presented in its entirety. Instead, it can be &lt;a href="http://depth-first.com/demo/20071015/cdk.rb"&gt;downloaded here&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;The library can be tested by saving it as a file called &lt;strong&gt;cdk.rb&lt;/strong&gt; and invoking &lt;tt&gt;jirb&lt;/tt&gt;. We can then convert a SMILES for benzene into the InChI for benzene:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'cdk'
=&gt; true
irb(main):002:0&gt; c=CDK::Conversion.new
=&gt; #&amp;lt;CDK::Conversion:0x4c6320 ... &amp;gt;
irb(main):003:0&gt; c.set_formats 'smi', 'inchi'
=&gt; "inchi"
irb(main):004:0&gt; c.convert 'c1ccccc1'
=&gt; "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Upcoming articles will show more examples of interconversions using this library, and discuss some of its limitations.&lt;/p&gt;

&lt;h4&gt;An Aside&lt;/h4&gt;

&lt;p&gt;It might be useful for Rubidium to support multiple &lt;tt&gt;Conversions&lt;/tt&gt;, each using its own cheminformatics toolkit. For example, a recent article discussed &lt;a href="http://depth-first.com/articles/2007/06/25/interconvert-almost-any-smiles-and-inchi-with-ruby-open-babel"&gt;SMILES and InChI interconversion with Ruby Open Babel&lt;/a&gt;. With a little tweaking, the Ruby Open Babel &lt;tt&gt;OBConversion&lt;/tt&gt; interface could be make identical to the Ruby interface used in today's tutorial. We could also configure &lt;a href="http://joelib.sf.net"&gt;JOELib&lt;/a&gt; and &lt;a href="http://sf.net/projects/rosetta"&gt;Rosetta&lt;/a&gt; &lt;tt&gt;Conversions&lt;/tt&gt; in an analogous fashion.&lt;/p&gt;

&lt;p&gt;Rubidium would then offer a family of molecular language converters, each of which used exactly the same API. We could then pick the best converter based on the situation at hand.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;With just a little Ruby code, we've created a convenient Ruby interface for interconverting SMILES, InChI, and molfile formats. JRuby supports even more interconversions through the CDK as well as other Java and Java Native Interface libraries. Future articles will discuss some of the possibilities.&lt;/p&gt;</description>
      <pubDate>Mon, 15 Oct 2007 10:59:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:fbde8b22-25ba-498c-8ade-b9a74738d560</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/15/an-introduction-to-the-rubidium-cheminforamtics-toolkit-interconvert-smiles-inchi-and-molfile-with-an-open-babel-like-interface</link>
      <category>Tools</category>
      <category>rubidium</category>
      <category>jruby</category>
      <category>java</category>
      <category>cdk</category>
    </item>
    <item>
      <title>JRuby for Cheminformatics: Reading and Writing InChIs Via the Java Native Interface</title>
      <description>&lt;p&gt;&lt;a href="http://ruby-lang.org"&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The increased use of the &lt;a href="http://depth-first.com/articles/2007/09/27/inchi-for-newbies"&gt;InChI identifier&lt;/a&gt; is making the reading and writing of InChIs a standard cheminformatics capability. Recent articles have discussed the &lt;a href="http://depth-first.com/articles/tag/jruby"&gt;advantages of JRuby for cheminformatics&lt;/a&gt;. One disadvantage of JRuby is that code written in C can't be directly used. The presents a potential problem for libraries, such as the InChI toolkit, that are written in C. Fortunately, the solution is simple. Today's tutorial will demonstrate how InChIs can be both read and written using the C-InChI toolkit via JRuby and the excellent &lt;a href="http://jni-inchi.sourceforge.net/"&gt;JNI-InChI library&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;About JNI-InChI&lt;/h4&gt;

&lt;p&gt;The &lt;a href="http://jni-inchi.sourceforge.net/"&gt;JNI-InChI&lt;/a&gt; library, written by Jim Downing and Sam Adams, wraps the &lt;a href="http://www.iupac.org/inchi/"&gt;C InChI toolkit&lt;/a&gt; in a Java Native Interface. This low-level toolkit is suitable for building more complex software, but lacks many features present in the C InChI toolkit. For example, JNI-InChI doesn't directly interconvert SMILES or molfile with InChI. For that you'd need to build a support library. If you're building a toolkit from scratch, this lightweight approach can be a significant advantage.&lt;/p&gt;

&lt;p&gt;The JNI-InChI binary distribution jarfile includes the compiled native InChI library. In this sense it's virtually indistinguishable from any other Java library. This simplified packaging makes it exceptionally easy to use JNI-InChI from JRuby, as we'll see below.&lt;/p&gt;

&lt;h4&gt;Installation&lt;/h4&gt;

&lt;p&gt;JRuby &lt;a href="http://depth-first.com/articles/2007/10/09/jruby-for-cheminformatics-parsing-smiles-simply"&gt;can be installed&lt;/a&gt; as described previously. To install the JNI-InChI library for JRuby, simply copy the &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=173262"&gt;current release jarfile&lt;/a&gt; into the &lt;tt&gt;lib&lt;/tt&gt; directory of your JRuby installation. That's all there is to it.&lt;/p&gt;

&lt;h4&gt;A Simple Library&lt;/h4&gt;

&lt;p&gt;We can now write a simple library to read InChIs via JRuby:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;include_class&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net.sf.jniinchi.JniInchiInput&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;include_class&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net.sf.jniinchi.JniInchiInputInchi&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;include_class&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net.sf.jniinchi.JniInchiWrapper&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;IUPAC&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;read_inchi&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;
    &lt;span class="ident"&gt;input&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;JniInchiInputInchi&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt; &lt;span class="ident"&gt;inchi&lt;/span&gt;

    &lt;span class="constant"&gt;JniInchiWrapper&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getStructureFromInchi&lt;/span&gt; &lt;span class="ident"&gt;input&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;By saving the above library to a file called &lt;strong&gt;iupac.rb&lt;/strong&gt;, we can parse InChIs via JRuby:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'iupac'
=&gt; true
irb(main):002:0&gt; include IUPAC
=&gt; Object
irb(main):003:0&gt; output = read_inchi 'InChI=1/C14H10/c1-3-7-13-11(5-1)9-10-12-6-2-4-8-14(12)13/h1-10H'
=&gt; #&lt;Java::NetSfJniinchi::JniInchiOutputStructure:0x1ed5459 @java_object=net.sf.jniinchi.JniInchiOutputStructure@313170&gt;
irb(main):004:0&gt; output.num_atoms
=&gt; 14
irb(main):005:0&gt; output.num_bonds
=&gt; 16
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Writing InChIs&lt;/h4&gt;

&lt;p&gt;Because JNI-InChI is a low-level toolkit, writing InChIs is feasible, but not trivial. We must first construct a representation, and then get the InChI for it. For example, we could get the InChI for methane as follows:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'java'
=&gt; true
irb(main):002:0&gt; include_class 'net.sf.jniinchi.JniInchiInput'
=&gt; ["net.sf.jniinchi.JniInchiInput"]
irb(main):003:0&gt; include_class 'net.sf.jniinchi.JniInchiAtom'
=&gt; ["net.sf.jniinchi.JniInchiAtom"]
irb(main):004:0&gt; include_class 'net.sf.jniinchi.JniInchiWrapper'
=&gt; ["net.sf.jniinchi.JniInchiWrapper"]
irb(main):005:0&gt; input = JniInchiInput.new
=&gt; #&lt;Java::NetSfJniinchi::JniInchiInput:0x2f2295 @java_object=net.sf.jniinchi.JniInchiInput@15b0333&gt;
irb(main):006:0&gt; a1 = input.add_atom JniInchiAtom.new(0,0,0, "C")
=&gt; #&lt;Java::NetSfJniinchi::JniInchiAtom:0x1b22920 @java_object=net.sf.jniinchi.JniInchiAtom@2f356f&gt;
irb(main):007:0&gt; a1.set_implicit_h(4)
=&gt; nil
irb(main):008:0&gt; output = JniInchiWrapper.get_inchi input
=&gt; #&lt;Java::NetSfJniinchi::JniInchiOutput:0xf894ce @java_object=net.sf.jniinchi.JniInchiOutput@132ae7&gt;
irb(main):009:0&gt; output.get_inchi
=&gt; "InChI=1/CH4/h1H4"
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Fortunately, we don't have to work that hard. The &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt;, through JNI-InChI, supports reading and writing of InChIs via a variety of molecular languages, including SMILES and molfile. More on that later, though.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Provided that a Java Native Interface exists for a C library, it can be used from JRuby. Future articles will discuss the use of other cheminformatics libraries written in either C or C++ from JRuby, and their integration with pure Java and Ruby libraries.&lt;/p&gt;</description>
      <pubDate>Wed, 10 Oct 2007 08:21:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:0348fa93-7376-488d-9afc-789590ac9fcb</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/10/jruby-for-cheminformatics-reading-and-writing-inchis-via-the-java-native-interface</link>
      <category>Tools</category>
      <category>rubidium</category>
      <category>jruby</category>
      <category>ruby</category>
      <category>java</category>
      <category>jni</category>
      <category>inchi</category>
      <category>cdk</category>
    </item>
    <item>
      <title>JRuby for Cheminformatics: Parsing SMILES Simply</title>
      <description>&lt;p&gt;&lt;a href="http://cdk.sf.net"&gt;&lt;img src="http://depth-first.com/files/cdk_logo.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://ruby-lang.org"&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The previous article in this series outlined some &lt;a href="http://depth-first.com/articles/2007/10/08/five-reasons-to-start-using-jruby-now"&gt;reasons to consider JRuby for cheminformatics&lt;/a&gt;. Now I'll show how easy it is to get started by describing how to parse SMILES strings with the help of the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK).&lt;/p&gt;

&lt;h4&gt;What About Ruby CDK?&lt;/h4&gt;

&lt;p&gt;A number of Depth-First articles have discussed &lt;a href="http://depth-first.com/articles/2007/10/04/ruby-cdk-for-newbies"&gt;Ruby CDK&lt;/a&gt;. This library runs on top of C-Ruby, otherwise known as Matz' Ruby Implementation (MRI). &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt; connects MRI to a Java Virtual Machine under Ruby CDK.&lt;/p&gt;

&lt;p&gt;This article, and the others to follow, will instead discuss the use of the CDK and other Java libraries from JRuby. In contrast to MRI, JRuby is a pure Java implementation of the Ruby language. This approach offers some important advantages which will be highlighted along the way.&lt;/p&gt;

&lt;h4&gt;Installing JRuby&lt;/h4&gt;

&lt;p&gt;JRuby is not difficult to install. On Linux, the steps are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Install &lt;a href="http://java.sun.com"&gt;JDK Version 1.4 or higher&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Download and unpack the most recent JRuby release - at the time of this writing, &lt;a href="http://dist.codehaus.org/jruby"&gt;version 1.0.1&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add the JRuby &lt;tt&gt;bin&lt;/tt&gt; directory to your path.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There is no Step 4. ;-)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Installing CDK for JRuby&lt;/h4&gt;

&lt;p&gt;Installing CDK so that it works on JRuby is similarly quite simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Download the most recent CDK jarfile - at the time of this writing, &lt;a href="http://downloads.sourceforge.net/cdk/cdk-1.0.1.jar?modtime=1182877138&amp;amp;big_mirror=0"&gt;version 1.0.1&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move the CDK jarfile to your JRuby &lt;tt&gt;lib&lt;/tt&gt; directory.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Testing CDK for JRuby&lt;/h4&gt;

&lt;p&gt;You can verify that your new CDK for JRuby installation works with &lt;tt&gt;jirb&lt;/tt&gt;:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'java'
=&gt; true
irb(main):002:0&gt; include_class 'org.openscience.cdk.smiles.SmilesParser'
=&gt; ["org.openscience.cdk.smiles.SmilesParser"]
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;You should notice that &lt;tt&gt;jirb&lt;/tt&gt; takes a few seconds to initialize the JVM, whereas &lt;tt&gt;irb&lt;/tt&gt; starts almost instantly.&lt;/p&gt;

&lt;h4&gt;A Library to Read SMILES&lt;/h4&gt;

&lt;p&gt;We can write a short library to read SMILES strings using the CDK:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;java&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;include_class&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.smiles.SmilesParser&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;Daylight&lt;/span&gt;
  &lt;span class="attribute"&gt;@@smiles_parser&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;SmilesParser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;read_smiles&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
    &lt;span class="attribute"&gt;@@smiles_parser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parse_smiles&lt;/span&gt; &lt;span class="ident"&gt;smiles&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Notice the use of the Rubyesque method name &lt;tt&gt;parse_smiles&lt;/tt&gt; rather than &lt;tt&gt;parseSmiles&lt;/tt&gt;. This is just one of the built-in conveniences offered by JRuby.&lt;/p&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

Saving the library as a file called &lt;strong&gt;daylight.rb&lt;/strong&gt; lets us test it using interactive JRuby:

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'daylight'
=&gt; true
irb(main):002:0&gt; include Daylight
=&gt; Object
irb(main):003:0&gt; mol = read_smiles 'c1ccccc1'
=&gt; #&lt;Java::OrgOpenscienceCdk:: [truncated] ...&gt;
irb(main):004:0&gt; mol.atom_count
=&gt; 6
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As you can see, the benzene SMILES has been parsed correctly. Again, notice the use of the Rubyesque method name &lt;tt&gt;atom_count&lt;/tt&gt;, rather than the CDK Java bean convention method name &lt;tt&gt;getAtomCount&lt;/tt&gt;. This feature makes it easy to ignore the fact you're using a Java library and get on with writing your Ruby code. Brilliant!&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;This article has shown how to install JRuby and begin to write some simple cheminformatics programs with a distinctive Ruby flavor. Although the focus was on SMILES parsing, there's much more functionality to be found within the CDK and other cheminformatics libraries written in Java. Future articles will outline some of the possibilities.&lt;/p&gt;</description>
      <pubDate>Tue, 09 Oct 2007 08:40:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:9007f034-5aa0-458c-b4e1-f9dc182d19be</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/10/09/jruby-for-cheminformatics-parsing-smiles-simply</link>
      <category>Tools</category>
      <category>jruby</category>
      <category>java</category>
      <category>ruby</category>
      <category>rubidium</category>
      <category>cdk</category>
      <category>smiles</category>
    </item>
    <item>
      <title>Roll Your Own Chemical Database With Free Components</title>
      <description>&lt;p&gt;Are you thinking of building a &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;free chemical database&lt;/a&gt; but would rather not rent and maintain a bunch of proprietary software components? &lt;a href="http://merian.pch.univie.ac.at/pch/nh_info.html"&gt;Norbert Haider&lt;/a&gt; has thought a lot about this problem and offers some helpful resources to get you started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://merian.pch.univie.ac.at/%7Enhaider/cheminf/moldb.html"&gt;Creating a web-based, searchable molecular structure database using free software&lt;/a&gt; Step-by step case study&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://merian.pch.univie.ac.at/%7Enhaider/cheminf/moldb.pdf"&gt;How to create a web-based molecular structure database with free software&lt;/a&gt; A presentation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://merian.pch.univie.ac.at/%7Enhaider/cheminf/cmmm.html"&gt;checkmol/matchmol&lt;/a&gt; Open source command-line utility for 2D (sub)structure matching&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://merian.pch.univie.ac.at/%7Enhaider/cheminf/mol2ps.html"&gt;mol2ps&lt;/a&gt; Command-line utility for converting molfiles into Postscript files&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Haider's system can be deployed on commodity hardware running open source operating systems. In other words, the cost of setting up a system like the one he describes is practically zero.&lt;/p&gt;

&lt;p&gt;Creating and open sourcing your own custom components is one way to go. Building on top of existing open source tools like &lt;a href="http://cdk.sf.net"&gt;CDK&lt;/a&gt;, &lt;a href="http://openbabel.sf.net"&gt;Open Babel&lt;/a&gt;, &lt;a href="http://depth-first.com/articles/tag/octet"&gt;Octet&lt;/a&gt; and &lt;a href="http://joelib.sf.net"&gt;JOELib&lt;/a&gt; is another.&lt;/p&gt;

&lt;p&gt;Haider's work raises an interesting question. Has anyone assembled a complete, ready to install general purpose chemical database package built from open source components? It for no other reason, such an exercise would give an excellent idea of what &lt;a href="http://depth-first.com/articles/2007/01/03/open-source-and-open-data-why-we-should-eat-our-own-dogfood"&gt;the dogfood tastes like&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Fri, 13 Apr 2007 10:27:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:a017a4e0-d8a0-48c2-87a3-5554b99b7373</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/04/13/roll-your-own-chemical-database-with-free-components</link>
      <category>Tools</category>
      <category>database</category>
      <category>2d</category>
      <category>web</category>
      <category>cdk</category>
      <category>openbabel</category>
      <category>opensource</category>
      <category>joelib</category>
    </item>
    <item>
      <title>Making the Case: Flux-2</title>
      <description>&lt;blockquote&gt;
    &lt;p&gt;&lt;a href="http://cdk.sf.net"&gt;&lt;img src="http://depth-first.com/files/cdk_logo.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;... The Flux software makes
    extensive use of the Chemistry Development Toolkit [sic] (CDK) as a cheminformatics library. Every descriptor that implements the Java interface for CDK descriptors can be utilized for the chemical similarity calculations; this includes the 69 descriptors available in CDK at the time of writing. ...&lt;/p&gt;
    
    &lt;p&gt;...&lt;/p&gt;
    
    &lt;p&gt;... CDK is employed as a cheminformatics toolkit (&lt;a href="http://cdk.sf.net"&gt;http://cdk.sf.net&lt;/a&gt;). The input and output of chemical structures in SDF format and the basic functionality for all structure manipulation are provided by CDK. Compounds are tested for uniqueness by means of canonical SMILES generated by CDK. Our ligand-based fitness function and molecule filter are based on descriptors that implement the Java interface for descriptors defined by CDK.&lt;/p&gt;
    
    &lt;p&gt;-&lt;cite&gt;Uli Fechner and Gisbert Schneider, &lt;a href="http://dx.doi.org/10.1021/ci6005307"&gt;J. Chem. Inf. Model. ASAP Articles&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The number of peer-reviewed publications using the Open Source Java library &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK) just keeps growing. The latest addition comes by way of a paper by Fechner and Schneider on ligand-based de novo design. In many areas, Open Source software is used not because it's free, but because it's perceived as superior to alternatives. How long will it be before this is true in cheminformatics?&lt;/p&gt;</description>
      <pubDate>Mon, 26 Feb 2007 09:41:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:c5e39684-33cd-49e4-b622-aecf31b842ea</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/02/26/making-the-case-flux-2</link>
      <category>Open X</category>
      <category>cdk</category>
      <category>liganddesign</category>
      <category>opensource</category>
      <category>java</category>
    </item>
    <item>
      <title>Making the Case: Personal Chemistry Client</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070119/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Good software designed with chemists in mind is still quite rare, and when that software is Open Source it's even rarer still. Two very popular titles are &lt;a href="http://jmol.sourceforge.net/"&gt;Jmol&lt;/a&gt; and &lt;a href="http://pymol.sourceforge.net/"&gt;PyMol&lt;/a&gt;. A third, &lt;a href="http://www.bioclipse.net/"&gt;Bioclipse&lt;/a&gt;, is gaining in popularity. So it was with great interest that I came upon a company called &lt;a href="http://www.akosgmbh.de/"&gt;AKos Consulting &amp;amp; Solutions&lt;/a&gt;, and their Open Source application &lt;a href="http://www.akosgmbh.de/pcc/"&gt;Personal Chemistry Client&lt;/a&gt; (PCC).&lt;/p&gt;

&lt;p&gt;PCC, as best as I can tell, is designed to be a personal chemical database. The good news is that PCC is licensed under the &lt;a href="http://opensource.org/licenses/gpl-license.php"&gt;GNU General Public License&lt;/a&gt;. The bad news is that PCC also requires two important things from your computer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A system capable of running the Microsoft .NET 2.0 Framework. The framework itself is included with the download. Unfortunately, this requirement rules out running PCC on Linux or Mac OSX.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The free (&lt;a href="http://depth-first.com/articles/2006/09/27/hacking-pubchem-free-speech-or-free-beer"&gt;as in beer&lt;/a&gt;) ActiveX plugin &lt;a href="http://www.hyleos.net/?s=applications&amp;amp;p=ChemView"&gt;ChemViewX&lt;/a&gt; from &lt;a href="http://www.hyleos.net/"&gt;Hyleos&lt;/a&gt;. This plugin is also included with the download.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I was able to download, install, and use PCC on my system (Windows XP Home) without a problem. Aside from some confusing behaviors of its template components, the application seems to work as described.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://cdk.sf.net"&gt;&lt;img src="http://depth-first.com/files/cdk_logo.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;Behind the scenes, PCC uses the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK) for structure searching. It's not clear what rendering engine is used by the ChemViewX plugin. Just looking at the output, though, it may well also be CDK.&lt;/p&gt;

&lt;p&gt;The emergence of PCC, an Open Source program developed by a for-profit vendor, is an exciting development. Business models may take some time to solidify, but chemistry clearly offers numerous &lt;a href="http://depth-first.com/articles/2006/09/03/peculiarities-of-chemical-information"&gt;peculiarities&lt;/a&gt; to take advantage of. And the folks behind PCC are well ahead of the curve.&lt;/p&gt;</description>
      <pubDate>Fri, 19 Jan 2007 14:49:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:36fcaeb3-8e24-496b-8ab0-55dd1b102a3b</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/01/19/making-the-case-personal-chemistry-client</link>
      <category>Meta</category>
      <category>opensource</category>
      <category>cdk</category>
      <category>akos</category>
      <category>applications</category>
      <category>jmol</category>
      <category>pymol</category>
      <category>bioclipse</category>
    </item>
    <item>
      <title>Diversity-Oriented Chemical Informatics</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/files/cdk_logo.png" align="right"&gt;&lt;/img&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right"&gt;&lt;/img&gt;How would you enumerate all of the molecules represented by a molecular formula? This question was recently posed to members of the &lt;a href="http://hardly.cubic.uni-koeln.de/pipermail/blue-obelisk/2006-November/000970.html"&gt;Blue Obelisk mailing list&lt;/a&gt;. Formula-based exhaustive structure enumeration may seem on the surface to be just another esoteric problem. Nevertheless, playing with open, interactive software that can perform such enumerations can be a great source of new ideas for applications and unit tests.&lt;/p&gt;

&lt;p&gt;The &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; offers a fully-functional exhaustive structure enumerator through its &lt;tt&gt;GENMDeterministicGenerator&lt;/tt&gt; class. This article will use &lt;tt&gt;GENMDeterministicGenerator&lt;/tt&gt; through the &lt;a href="http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0"&gt;Ruby CDK&lt;/a&gt; interface to generate color 2-D images for all molecules of a given molecular formula.&lt;/p&gt;

&lt;h4&gt;A Solution&lt;/h4&gt;

&lt;p&gt;The software described in this article will generate a collection of 2-D molecular PNG images based on a user-supplied molecular formula. When viewed in a file browser such as Windows Explorer or &lt;a href="http://www.konqueror.org/"&gt;Konqueror&lt;/a&gt;, the output is visible as a matrix of images. The filename of each image is given by the SMILES string of the corresponding molecule. All molecules are enumerated, whether they look "reasonable" or not. As an example, consider a section of the output for 'C4H8ClNO', which looks like this on my system:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20061115/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h4&gt;Enumerator: A Small Ruby Library&lt;/h4&gt;

&lt;p&gt;We'll create a small Ruby class to do most of the work. Save the following in a file called &lt;strong&gt;enum.rb&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk/util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;jrequire&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.structgen.deterministic.GENMDeterministicGenerator&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;jrequire&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net.sf.structure.cdk.util.ImageKit&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Enumerator&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;formula&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="attribute"&gt;@generator&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Org&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Openscience&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Cdk&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Structgen&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Deterministic&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;GENMDeterministicGenerator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;formula&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
    &lt;span class="attribute"&gt;@width&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;150&lt;/span&gt;
    &lt;span class="attribute"&gt;@height&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;150&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;set_size&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="attribute"&gt;@width&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;
    &lt;span class="attribute"&gt;@height&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;write_images&lt;/span&gt;
    &lt;span class="ident"&gt;mols&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="attribute"&gt;@generator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getStructures&lt;/span&gt;
    &lt;span class="ident"&gt;iterator&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;mols&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;iterator&lt;/span&gt;

    &lt;span class="keyword"&gt;while&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;iterator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;hasNext&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="ident"&gt;mol&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;XY&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;coordinate_molecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;iterator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;next&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="ident"&gt;smiles&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Lang&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get_smiles&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

      &lt;span class="constant"&gt;Net&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Sf&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Structure&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Cdk&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;ImageKit&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;writePNG&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;mol&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="attribute"&gt;@width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="attribute"&gt;@height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&lt;span class="expr"&gt;#{smiles}&lt;/span&gt;.png&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As you can see, this class is nothing more than a thin wrapper around a large amount of CDK functionality. Most of the action happens in the &lt;tt&gt;write_images&lt;/tt&gt; method, where three things take place:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;We retrieve a list of molecules from the &lt;tt&gt;GENMDeterministicGenerator&lt;/tt&gt; instance that satisfy the molecular formula passed to &lt;tt&gt;Enumerator's&lt;/tt&gt; constructor.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;These molecules are iterated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For each molecule, an image is written with the filename given by its SMILES string.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Testing the Library&lt;/h4&gt;

&lt;p&gt;To test the library, the following code can either be entered interactively via Interactive Ruby (irb) or saved to a file and run with the Ruby interpreter (ruby):&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;enum&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;e&lt;/span&gt;&lt;span class="punct"&gt;=&lt;/span&gt;&lt;span class="constant"&gt;Enumerator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;C4H8ClNO&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;e&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;write_images&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Running this code will produce a collection of PNG images in your working directory. By changing the argument passed to the &lt;tt&gt;Enumerator&lt;/tt&gt; constructor, you can change the makeup of the image set.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;For this tutorial, you'll need &lt;a href="http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0"&gt;Ruby CDK&lt;/a&gt; (RCDK). A recent article described the small amount of system configuration required for &lt;a href="http://depth-first.com/articles/2006/09/25/cdk-the-ruby-way-rcdk-0-2-0"&gt;RCDK on Linux&lt;/a&gt;. Another article showed how to install &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;RCDK on Windows&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Unexpected Behavior&lt;/h4&gt;

&lt;p&gt;After testing the Enumerator library, you may notice a new file in your working directory called &lt;strong&gt;structuredata.txt&lt;/strong&gt;. This file is written automatically by &lt;tt&gt;GENMDeterministicGenerator&lt;/tt&gt; on instantiation, providing information on each structure that is generated. The &lt;a href="http://cdk.sourceforge.net/api/org/openscience/cdk/structgen/deterministic/GENMDeterministicGenerator.html"&gt;CDK API&lt;/a&gt; does not mention the creation of this file, and it would be preferable for this file to only created on request. I'll be submitting a &lt;a href="http://sourceforge.net/tracker/?group_id=20024&amp;amp;atid=370024"&gt;feature request&lt;/a&gt; to this effect shortly.&lt;/p&gt;

&lt;h4&gt;Food for Thought&lt;/h4&gt;

&lt;p&gt;If you plan to explore larger areas of chemical space with the Enumerator library, be prepared to wait. The generation of molecules, determination of 2-D coordinates, and rendering can take some time. Of course, the number of molecules increases dramatically with the number of atoms in the molecular formula - a concrete demonstration of what makes organic chemistry the fascinating discipline that it is.&lt;/p&gt;

&lt;p&gt;An interesting variation on the ideas presented here would be to filter out molecules based on some criteria. One approach would be to remove molecules containing reactive functionality such as nitrogen substituted with chorine. A SMARTS pattern search could easily form the basis for this filter. In applying this and similar filters, larger areas of interesting chemical space could be sampled in a reasonable amount of time.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;CDK's &lt;tt&gt;GENMDeterministicGenerator&lt;/tt&gt; class, when combined with 2-D structure layout and 2-D rendering, provides the foundation of an intriguing tool for exploring chemical diversity. Further combining this capability with that offered by other freely-available tools offers some thought-provoking possibilities.&lt;/p&gt;</description>
      <pubDate>Wed, 15 Nov 2006 15:03:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:16ee911f-73ea-4056-9f9d-dcad5a698a91</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/11/15/diversity-oriented-chemical-informatics</link>
      <category>Tools</category>
      <category>diversity</category>
      <category>cdk</category>
      <category>ruby</category>
      <category>rcdk</category>
      <category>enumeration</category>
      <category>integration</category>
    </item>
  </channel>
</rss>
