<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag bdd</title>
    <link>http://depth-first.com/articles/tag/bdd</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Parsing SD Files with Ruby and Rubidium</title>
      <description>&lt;p&gt;&lt;a href="http://rbtk.rubyforge.org"&gt;&lt;img src="http://depth-first.com/demo/20071015/rubidium.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Reading SD files is a bread-and-butter cheminformatics operation. At a minimum, a cheminformatics toolkit needs to parse the individual entries of an SD file, and provide access to the embedded molfile and data hash for each.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://depth-first.com/articles/tag/rubidium"&gt;Recent articles&lt;/a&gt; have introduced &lt;a href="http://rbtk.rubyforge.org"&gt;Rubidium&lt;/a&gt;, a Ruby cheminformatics scripting environment. The Rubidium team now announces the release of &lt;a href="http://rubyforge.org/frs/?group_id=4671"&gt;Rubidium-0.1.1&lt;/a&gt;, which, among other features, introduces the ability to parse SD files.&lt;/p&gt;

&lt;h4&gt;Prerequisites&lt;/h4&gt;

&lt;p&gt;Rubidium is designed to run on &lt;a href="http://jruby.codehaus.org/"&gt;JRuby&lt;/a&gt;. Installing JRuby is straightforward on unix-like systems. First, download the &lt;a href="http://dist.codehaus.org/jruby/jruby-bin-1.1b1.tar.gz"&gt;JRuby-1.1b1 binary release&lt;/a&gt;. Then, unpack the archive to your directory of choice. Set &lt;tt&gt;$JRUBY_HOME&lt;/tt&gt; and &lt;tt&gt;$JAVA_HOME&lt;/tt&gt;. Finally, add &lt;tt&gt;$JRUBY_HOME/bin&lt;/tt&gt; to your path.&lt;/p&gt;

&lt;h4&gt;Installing Rubidium-0.1.1&lt;/h4&gt;

&lt;p&gt;Generally speaking, it should be possible to install Rubidium with a one-line command to RubyGems:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jruby -S gem install rbtk
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Unfortunately at the time of this writing, I was receiving the mysterious &lt;a href="http://www.google.com/search?q=rubygems+%22ERROR:++While+executing+gem+...+OpenURI::HTTPError%22&amp;amp;hl=en&amp;amp;pwst=1&amp;amp;start=0&amp;amp;sa=N"&gt;RubyGems 404 error&lt;/a&gt; with the RubyForge remote repository:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jruby -S gem install rbtk
Select which gem to install for your platform (java)
 1. rbtk 0.1.1 (java)
 2. rbtk 0.1.0 (java)
 3. Skip this gem
 4. Cancel installation
&gt; 1
ERROR:  While executing gem ... (OpenURI::HTTPError)
    404 Not Found
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This appears to affect only certain RubyGems on RubyForge - possibly only those with multiple versions. It seems to be an error on the RubyForge server that occasionally appears and then disappears.&lt;/p&gt;

&lt;p&gt;As a workaround, you can &lt;a href="http://rubyforge.org/frs/download.php/27819/rbtk-0.1.1-jruby.gem"&gt;download the Rubidium gem&lt;/a&gt; and install it manually:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jruby -S gem install tmp/rbtk-0.1.1-jruby.gem
&lt;/div&gt;

&lt;p&gt;&lt;/pre&gt;&lt;/p&gt;

&lt;p&gt;Because Rubidium-0.1.1 introduces an &lt;a href="http://rubyforge.org/projects/activesupport/"&gt;Active Support&lt;/a&gt; dependency, you will need to install that library before installing Rubidium:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ jruby -S gem install tmp/rbtk-0.1.1-jruby.gem
ERROR:  While executing gem ... (RuntimeError)
    Error instaling tmp/rbtk-0.1.1-jruby.gem:
        rbtk requires activesupport &gt;= 1.4.2
$ jruby -S gem install activesupport
Successfully installed activesupport-1.4.4
Installing ri documentation for activesupport-1.4.4...
Installing RDoc documentation for activesupport-1.4.4...
$ jruby -S gem install tmp/rbtk-0.1.1-jruby.gem
Successfully installed rbtk, version 0.1.1
Installing ri documentation for rbtk-0.1.1-jruby...
Installing RDoc documentation for rbtk-0.1.1-jruby...
&lt;/div&gt;

&lt;p&gt;&lt;/pre&gt;&lt;/p&gt;

&lt;p&gt;It's possible that the RubyForge 404 issue will be resolved by the time you read this article, so &lt;tt&gt;jruby -S gem install rbtk&lt;/tt&gt; should be tried first.&lt;/p&gt;

&lt;h4&gt;Parsing an SD File&lt;/h4&gt;

&lt;p&gt;Let's say we'd like to extract all InChIs from a PubChem dataset. If you don't have one handy, a compilation of about 2000 PubChem benzodiazepines has been &lt;a href="http://rubyforge.org/frs/download.php/27768/pubchem_benzodiazepine_20071110.sdf.gz"&gt;deposited on RubyForge&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;With our unzipped datafile in our working directory, we can now test the SD File parser by saving the following library to a file called &lt;strong&gt;parse.rb&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rbtk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubidium/sdf&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;parse_sd&lt;/span&gt; &lt;span class="ident"&gt;filename&lt;/span&gt;
  &lt;span class="ident"&gt;p&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rubidium&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;SDF&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Parser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt; &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;filename&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

  &lt;span class="ident"&gt;p&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;each&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;entry&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
    &lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;InChI: &lt;span class="expr"&gt;#{entry['PUBCHEM_NIST_INCHI']}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

which can be tested with &lt;tt&gt;jirb&lt;/tt&gt;:

&lt;div class="console"&gt;
&lt;pre&gt;
$ jirb
irb(main):001:0&gt; require 'parse'
=&gt; true
irb(main):002:0&gt; parse_sd 'pubchem_benzodiazepine_20071110.sdf'
InChI: InChI=1/C16H12Cl2N2O/c1-20-14-7-6-12(18)8-13(14)16(19-9-15(20)21)10-2-4-11(17)5-3-10/h2-8H,9H2,1H3

[truncated]
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;RSpec and Behavior-Driven Development&lt;/h4&gt;

&lt;p&gt;If you &lt;a href="http://rubyforge.org/frs/download.php/27820/rbtk-0.1.1.tar.gz"&gt;check out the Rubidium source distribution&lt;/a&gt;, you'll notice that the SD parser library is tested with &lt;a href="http://rspec.rubyforge.org/"&gt;RSpec&lt;/a&gt;, the &lt;a href="http://en.wikipedia.org/wiki/Behavior_driven_development"&gt;BDD&lt;/a&gt; framework for Ruby. Ultimately, all components of Rubidium will be tested and documented this way.&lt;/p&gt;

&lt;h4&gt;Acknowledgments&lt;/h4&gt;

&lt;p&gt;Rubidium's new SD file parser was written by &lt;a href="http://www.moseshohman.com/"&gt;Moses Hohman&lt;/a&gt;. It was kindly donated by &lt;a href="http://www.collaborativedrug.com/"&gt;Collaborative Drug Discovery&lt;/a&gt;, who have built their drug discovery application using &lt;a href="http://rubyonrails.com"&gt;Ruby on Rails&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Future Directions&lt;/h4&gt;

&lt;p&gt;One problem in working with SD files is pinpointing encoding errors. A parser should not only raise an exception, but point to a line number and identify offending text to aid debugging. Rubidium's SD parser will eventually incorporate these enhancements.&lt;/p&gt;

&lt;p&gt;Because Rubidium runs on JRuby, performance gains may be achievable by re-writing select portions in Java.&lt;/p&gt;

&lt;p&gt;Parsing SD files is only the beginning of the story. Many cheminformatics applications need a convenient, fast, and robust method for &lt;em&gt;writing&lt;/em&gt; molfiles. This is also something Rubidium will attempt to provide.&lt;/p&gt;

&lt;p&gt;If your company or organization is curious about Ruby and cheminforamatics, give Rubidium a try. Rubidium is licensed under the permissive &lt;a href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT License&lt;/a&gt; to make collaboration as simple as possible.&lt;/p&gt;</description>
      <pubDate>Mon, 12 Nov 2007 11:27:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:8e195fb8-22d0-4ea3-a2bd-40f44281fc8f</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/12/parsing-sd-files-with-ruby-and-rubidium</link>
      <category>Tools</category>
      <category>rubidium</category>
      <category>ruby</category>
      <category>cdd</category>
      <category>sdfile</category>
      <category>sdf</category>
      <category>bdd</category>
      <category>rspec</category>
      <category>jruby</category>
    </item>
  </channel>
</rss>
