<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag molfile</title>
    <link>http://depth-first.com/articles/tag/molfile</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Extending InChI Stereochemistry</title>
      <description>&lt;p&gt;&lt;a href="http://www.reuters.com/article/pressRelease/idUS195509+17-Jun-2008+BW20080617"&gt;As covered by Reuters&lt;/a&gt; and many other wire services, &lt;a href="http://artuslabs.com"&gt;ArtusLabs&lt;/a&gt; and Boston University's &lt;a href="http://depth-first.com/articles/2007/06/18/yet-another-free-chemical-database-reaction-searching-with-cmld-bu"&gt;CMLD&lt;/a&gt; have teamed up to extend InChI's stereochemistry support:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;DURHAM, N.C.--(Business Wire)--
    ArtusLabs, Inc., a leading provider of life science software tools
    and data management solutions, has entered into a partnership with
    Boston University's Center for Chemical Methodology and Library
    Development (CMLD) to develop a way to standardize and expand the way
    in which stereochemistry, and ultimately a three-dimensional
    structures, are represented in the International Chemical Identifier
    (InChI(TM)).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With the increasing use of molecules containing &lt;a href="http://depth-first.com/articles/2007/01/08/the-axial-chirality-problem"&gt;axial chirality&lt;/a&gt; , &lt;a href="http://depth-first.com/articles/2007/01/22/a-molecular-language-for-modern-chemistry-flexmol-and-planar-chiral-metacyclophanes"&gt;planar chirality&lt;/a&gt; and other forms of non-tetrahedral stereogenicity in chemistry, the move by ArtusLabs and CMLD could be significant.&lt;/p&gt;

&lt;p&gt;Put simply, the ability of cheminformatics to represent certain kinds of compounds has fallen way behind the ability of chemistry to make them. While molecules once considered mere oddities 30 years ago continue to pour into corporate compound collections, laboratory notebooks, and product catalogs, cheminformatics has been stuck with a form of molecular representation that hasn't changed significantly in several decades.&lt;/p&gt;

&lt;p&gt;InChI isn't alone. All three of the most widely-used molecular representation systems now in use (Molfile, SMILES, and CML) suffer from fundamental limitations in representing axial chirality, planar chirality, and &lt;a href="http://depth-first.com/articles/2006/12/19/ferrocene-and-beyond-a-solution-to-the-molecular-representation-problem"&gt;multicenter bonding&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The kind of work being undertaken by ArtusLabs and CMLD is essential if cheminformatics is to continue to keep pace with new developments in chemistry.&lt;/p&gt;</description>
      <pubDate>Wed, 09 Jul 2008 10:18:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:4c49e622-4ad4-47de-85b6-d62bb88773dc</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/07/09/extending-inchi-stereochemistry</link>
      <category>Tools</category>
      <category>flexmol</category>
      <category>axialchirality</category>
      <category>planarchirality</category>
      <category>artuslabs</category>
      <category>inchi</category>
      <category>molfile</category>
      <category>smiles</category>
      <category>cmld</category>
    </item>
    <item>
      <title>Run Babel Anywhere Java Runs with JBabel</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.sf.net"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;A &lt;a href="http://depth-first.com/articles/tag/nestedvm"&gt;recent series of D-F articles&lt;/a&gt; have discussed the use of &lt;a href="http://nestedvm.ibex.org/"&gt;NestedVM&lt;/a&gt; to compile cheminformatics programs written in C/C++ to pure java binaries that can be run on any system with a JVM. More specifically, an attempt to compile &lt;a href="http://openbabel.sf.net"&gt;OpenBabel's&lt;/a&gt; &lt;tt&gt;babel&lt;/tt&gt; program to bytecode was only &lt;a href="http://depth-first.com/articles/2007/11/26/compiling-open-babel-to-pure-java-bytecode-with-nestedvm-building-a-runnable-classfile-that-almost-works"&gt;partially successful&lt;/a&gt;. With the &lt;a href="http://sourceforge.net/mailarchive/forum.php?thread_name=819391.60947.qm%40web34201.mail.mud.yahoo.com&amp;amp;forum_name=openbabel-discuss"&gt;help of Geoff Hutchison&lt;/a&gt;, the problem was resolved. This article introduces JBabel, a platform-independent, pure Java implementation of OpenBabel's &lt;tt&gt;babel&lt;/tt&gt; program.&lt;/p&gt;

&lt;h4&gt;A Little About JBabel&lt;/h4&gt;

&lt;p&gt;JBabel was compiled from the &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=40728&amp;amp;package_id=32894&amp;amp;release_id=521581"&gt;Open Babel 2.1.1 source release&lt;/a&gt; and can be &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=144794&amp;amp;package_id=255103"&gt;downloaded from SourceForge&lt;/a&gt;. The same jarfile was successfully tested on Linux, Windows and Mac OS X. You can verify JBabel works on your platform with the following command:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -Hsmi
smi  SMILES format
A linear text format which can describe the connectivity
and chirality of a molecule
Write Options e.g. -xt
  n no molecule name
  t molecule name only
  r radicals lower case eg ethyl is Cc
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This version of JBabel was compiled with support for three formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SMILES (smi). Non-canonical SMILES.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MDL (mol). Molfiles and SD Files.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Canonical SMILES (can). Canonical SMILES implementation &lt;a href="http://depth-first.com/articles/2006/11/06/stone-soup"&gt;donated by eMolecules&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll discuss exactly how support for these formats was added in a subsequent post. More formats will be added in the future. For now, let's just try JBabel out.&lt;/p&gt;

&lt;h4&gt;Testing JBabel&lt;/h4&gt;

&lt;p&gt;One way to use JBabel is interactively from the command line - just leave out an input or output file parameter. For example, if you wanted to get the eMolecules canonical SMILES for &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=68617"&gt;sertraline&lt;/a&gt;, you might do something like this (be sure to use two returns to begin processing):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -ismi -ocan
CN[C@H]1CC[C@H](C2=CC=CC=C12)C3=CC(=C(C=C3)Cl)Cl

CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12
1 molecule converted
34 audit log messages
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This canonical SMILES can be converted into a molfile with the following:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -ismi -omol
CN[C@H]1CC[C@H](c2ccc(Cl)c(Cl)c2)c2ccccc12


 OpenBabel12090723182D

 22 24  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0

...
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;To convert using input and output files, we could use a medium-sized dataset such as the &lt;a href="http://rubyforge.org/frs/download.php/27768/pubchem_benzodiazepine_20071110.sdf.gz"&gt;PubChem benzodiazepine dataset&lt;/a&gt; prepared for &lt;a href="http://rbtk.rubyforge.org/"&gt;Rubidium&lt;/a&gt;:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ java -jar jbabel-20071209.jar -imol pubchem_benzodiazepine_20071110.sdf -ocan pubchem_benzodiazepine_20071110.smi
==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problems reading a MDL file
Cannot read title line

2117 molecules converted
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This test, which parses 2117 records, required four minutes forty-five seconds on my system. For comparison, the natively compiled binary did the same thing in about thirteen seconds. Clearly, the JBabel performance hit is substantial.&lt;/p&gt;

&lt;h4&gt;Uses&lt;/h4&gt;

&lt;p&gt;Although it's very unlikely that JBabel will ever be useful in performance-critical situations, its portability makes it attractive for other uses. Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;application development in heterogeneous computing environments;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;use on systems in which native compilation may be difficult, such as those with unusual configurations or operating systems;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cases in which native binaries work poorly or not at all, such as in applets and Java applications;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;situations in which performance is a minor consideration, such as in end-user applications that process only a few molecules at a time, or during application prototyping&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;This article has described JBabel, the first portable binary version of OpenBabel's &lt;tt&gt;babel&lt;/tt&gt; molecular file format interconversion program. The next article in this series will describe in detail the steps that were used to compile it.&lt;/p&gt;</description>
      <pubDate>Mon, 10 Dec 2007 08:50:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:5d98a980-e3d6-4afd-8eb3-25769a28d13b</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/12/10/run-babel-anywhere-java-runs-with-jbabel</link>
      <category>Tools</category>
      <category>jbabel</category>
      <category>babel</category>
      <category>openbabel</category>
      <category>nestedvm</category>
      <category>molfile</category>
      <category>canonicalsmiles</category>
      <category>smiles</category>
    </item>
    <item>
      <title>Ruby CDK One-Liners: Create a Molfile With 2D Atom Coordinates From Arbitrary SMILES Strings</title>
      <description>&lt;p&gt;&lt;a href="http://ruby-lang.org"&gt;&lt;img src="http://depth-first.com/files/ruby_logo_new.gif" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;A very common operation in cheminformatics is the interconversion of molfiles and SMILES strings. Usually, converting from SMILES gives a molfile in which all atoms have coordinates of (0,0,0). Sometimes you just need more than that. The following &lt;a href="http://depth-first.com/articles/tag/rcdk"&gt;Ruby CDK&lt;/a&gt; code will accept an arbitrary SMILES string and return a molfile with fully-assigned 2D atom coordinates:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rcdk/util&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;include&lt;/span&gt; &lt;span class="constant"&gt;RCDK&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Util&lt;/span&gt;

&lt;span class="constant"&gt;XY&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;coordinate_molfile&lt;/span&gt; &lt;span class="constant"&gt;Lang&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;smiles_to_molfile&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;c1ccccc1&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Looking at it this way, those four lines of require/include statements seem pretty darn verbose.&lt;/p&gt;</description>
      <pubDate>Thu, 20 Sep 2007 14:18:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:8c347f16-8a0c-4d35-a02c-a2560fdc5f79</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/20/ruby-cdk-one-liners-create-a-molfile-with-2d-atom-coordinates-from-arbitrary-smiles-strings</link>
      <category>Tools</category>
      <category>rubycdk</category>
      <category>rcdk</category>
      <category>smiles</category>
      <category>molfile</category>
      <category>interconversion</category>
      <category>sdg</category>
      <category>coordinates</category>
    </item>
    <item>
      <title>Never Draw the Same Molecule Twice: Viewing Image Metadata</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070801/rosiglitazone.png" align="right"&gt;&lt;/img&gt;Chemists are accustomed to embedding live molecular objects in their documents with Microsoft Word/ChemDraw. These objects can then be reprocessed and embedded into other documents, such as PowerPoint presentations, saving enormous amounts of time. What if the same feature were available with Web documents?&lt;/p&gt;

&lt;p&gt;A &lt;a href="http://depth-first.com/articles/2007/08/01/never-draw-the-same-molecule-twice-image-metadata-for-cheminformatics"&gt;recent D-F article&lt;/a&gt; proposed a method to encode molecular structure data within commonly-used Web image formats such as PNG. That article contained an embedded image of GlaxoSmithKline's diabetes treatment rosiglitazone (Avandia) encoded by a rendering toolkit built with &lt;a href="http://depth-first.com/articles/tag/firefly"&gt;Firefly&lt;/a&gt;. I claimed that this image contained the complete connection table and atom coordinates as embedded metadata. In this article, I'll show a simple method to read this metadata.&lt;/p&gt;

&lt;p&gt;Metadata is a standard part of the PNG specification; to read it requires nothing more than software capable of recognizing it. I recently found a Web-based, cross-platform method for doing so. The &lt;a href="http://www.fileformat.info/convert/image/metadata.htm"&gt;Image Metadata Viewer&lt;/a&gt; by &lt;a href="http://www.fileformat.info/index.htm"&gt;FileFormat.info&lt;/a&gt; accepts an uploaded image file and returns that image's metadata. Let's try it with the image of rosiglitazone.&lt;/p&gt;

&lt;p&gt;After saving the image to my hard drive, uploading it to FileFormat.info and pressing start, I can see that the image contains metadata:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070808-2/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The metadata can be viewed either as XML or as plain text. Choosing plain text (second option) gives me the complete molfile, stored as a key/value hash (molfile=[molfile]).&lt;/p&gt;

&lt;p&gt;Clearly, reading metadata is not a problem given the right software. But this leaves the question of how metadata is encoded in the first place - especially in a programming language such as Java. Like everything else, it's not difficult when you know how. Stay tuned for the answer.&lt;/p&gt;</description>
      <pubDate>Wed, 08 Aug 2007 07:40:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:980141bf-863c-4bec-be42-9e5445b88f42</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/08/08/never-draw-the-same-molecule-twice-viewing-image-metadata</link>
      <category>Tools</category>
      <category>firefly</category>
      <category>2d</category>
      <category>metadata</category>
      <category>molfile</category>
      <category>png</category>
      <category>rosiglitazone</category>
      <category>chemdraw</category>
    </item>
    <item>
      <title>Never Draw the Same Molecule Twice: Image Metadata for Cheminformatics</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070801/rosiglitazone.png" align="right" border="0"&gt;&lt;/img&gt;The graphical language of 2D structures has served chemistry well for the last 100 years. Ironically, this language which is so useful for human communication is extraordinarily difficult for machines to understand. Heroic efforts at digital raster image recognition such as &lt;a href="http://cactus.nci.nih.gov/osra/"&gt;OSRA&lt;/a&gt; and those &lt;a href="http://chem-bla-ics.blogspot.com/2007/07/optical-chemical-structure-recognition.html"&gt;recently summarized by Egon Willighagen&lt;/a&gt;, in addition to a &lt;a href="http://depth-first.com/articles/2006/08/25/computational-perception-and-recognition-of-digitized-molecular-structures"&gt;handful of others&lt;/a&gt;, have tried to tackle this problem with varying degrees of success.&lt;/p&gt;

&lt;p&gt;The problem remains unsolved, and continues to be one of the most difficult technical challenges in cheminformatics. But the pace at which non-machine readable images are generated has accelerated dramatically in the last two years with the emergence of &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;numerous free chemical databases&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;What if 2D structure images simply contained all of the information needed for machine processing in the first place?&lt;/p&gt;

&lt;p&gt;This idea isn't as far-fetched as it may sound initially. As discussed in a &lt;a href="http://depth-first.com/articles/2007/07/30/editable-and-searchable-2d-molecular-images"&gt;recent D-F article&lt;/a&gt;, both &lt;a href="http://www.nongnu.org/gchempaint/"&gt;GChemPaint&lt;/a&gt; and &lt;a href="http://www.acdlabs.com/download/chemsk.html"&gt;ACD ChemSketch&lt;/a&gt; have been claimed to be capable of encoding machine-readable structure information.&lt;/p&gt;

&lt;p&gt;Previous D-F articles have described &lt;a href="http://depth-first.com/articles/tag/firefly"&gt;"Firefly"&lt;/a&gt;, the codename for a new lightweight 2D structure editor designed specifically for the Web. With major work on the editor's user interface complete, more recent efforts have focused on implementing a 2D rendering toolkit, and with it a mechanism to encode structural information within 2D molecular images.&lt;/p&gt;

&lt;p&gt;As a demonstration of what is now possible, consider the structure of GlaxoSmithKline's diabetes treatment rosiglitazone (Avandia), depicted as a PNG image at the beginning of this article. At first glance, the image appears to be just like any other image of a 2D molecular structure. But it is not, for embedded within it are the connection table and 2D atom coordinates of rosiglitazone encoded as an industry-standard molfile.&lt;/p&gt;

&lt;p&gt;Given the right software, a computer can interpret the structural information encoded in the rosiglitazone image and precisely re-create the original molecular representation. A graphical diagnostic tool bundled with Firefly was equipped with code for precisely this purpose.&lt;/p&gt;

&lt;p&gt;This tool can work with molfile-encoded PNG images just as easily as it can with molfiles; they can be opened and the resulting molecule can be further edited, saved in another format, or re-written as a embedded-molfile PNG image.&lt;/p&gt;

&lt;p&gt;The first step is to select the PNG image from a local hard drive:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070801/file_open2.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Opening this image produces a fully-editable version of the original molecule:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070801/firefly.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Obviously, nothing limits this technique to molfiles. InChI, SMILES, CML, or any other molecular encoding scheme would work just as well.&lt;/p&gt;

&lt;p&gt;Using molecular-encoded PNG images as a Web-ready replacement for the &lt;a href="http://depth-first.com/articles/2007/07/30/editable-and-searchable-2d-molecular-images"&gt;Word/Chemdraw OLE&lt;/a&gt; technology may be one application of this approach. With a large corpus of these images, chemical Web spidering and data mining would be possible on a scale unimaginable today. As always, these possibilities reinforce the desperate need for high quality tools that chemists actually want to use, and which simultaneously yield machine-readable output.&lt;/p&gt;</description>
      <pubDate>Wed, 01 Aug 2007 06:17:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:bc2292ae-4f21-47ec-b8e5-41a17a52e8c9</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/08/01/never-draw-the-same-molecule-twice-image-metadata-for-cheminformatics</link>
      <category>Tools</category>
      <category>firefly</category>
      <category>2d</category>
      <category>molfile</category>
      <category>encoded</category>
      <category>png</category>
      <category>rosiglitazone</category>
      <category>metadata</category>
    </item>
    <item>
      <title>An Object-Oriented Framework for Molecular Representation: Getting Started with Octet</title>
      <description>&lt;p&gt;&lt;a href="http://www.amazon.com/gp/product/0201633612?ie=UTF8&amp;amp;tag=depthfirst-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0201633612"&gt;&lt;img border="0" src="http://depth-first.com/files/design_patterns.jpg" align="right"&gt;&lt;/a&gt;&lt;img src="http://www.assoc-amazon.com/e/ir?t=depthfirst-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=0201633612" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /&gt;&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;If applications are hard to design and toolkits are harder, then frameworks are hardest of all. A framework designer gambles that one architecture will work for all applications in the domain. Any substantive change to the framework's design would reduce its benefits considerably, since the framework's main contribution to an application is the architecture it defines. Therefore it's imperative to design the framework to be as flexible and extensible as possible.&lt;/p&gt;

    &lt;p&gt;-&lt;cite&gt;Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides- &lt;em&gt;&lt;a href="http://www.amazon.com/gp/product/0201633612?ie=UTF8&amp;amp;tag=depthfirst-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0201633612"&gt;Design Patterns&lt;/a&gt;&lt;img src="http://www.assoc-amazon.com/e/ir?t=depthfirst-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=0201633612" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /&gt;&lt;/em&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One of the most important considerations when building an application is the choice of framework. As the quote from the &lt;a href="http://en.wikipedia.org/wiki/Design_patterns"&gt;Gang of Four&lt;/a&gt; implies, there's much more to frameworks than just a collection of re-usable code. At their best, frameworks provide a foundation for thinking about a problem domain and a language for communicating with other developers about it. In this article, I'll introduce &lt;a href="http://sf.net/projects/octet"&gt;Octet&lt;/a&gt;, an object-oriented framework for molecular representation.&lt;/p&gt;

&lt;h4&gt;The Molecular Representation Problem&lt;/h4&gt;

&lt;p&gt;Isn't molecular representation a solved problem? After all, don't SMILES, Molfile, InChI, and CML adequately represent any molecule the average software developer is likely to see?&lt;/p&gt;

&lt;p&gt;As &lt;a href="http://depth-first.com/articles/2006/12/19/ferrocene-and-beyond-a-solution-to-the-molecular-representation-problem"&gt;previously discussed&lt;/a&gt;, molecular representation technologies have stagnated while the molecules chemists themselves now routinely make and use have continued to become more and more "exotic." Developers are now faced with the thorny problem that a variety of common structural motifs in chemistry can't be adequately represented with industry-standard cheminformatics tools.&lt;/p&gt;

&lt;p&gt;This point is so important, I'll repeat it: cheminformatics has fallen behind chemistry in the kinds of molecules it can work with. Quick fixes only allow the problem to fester; what's needed is a comprehensive solution. This is Octet's problem domain.&lt;/p&gt;

&lt;p&gt;Every framework is bounded by a specific problem domain. Although the size of the domain can vary, a framework provides a comprehensive solution within it. For complex and poorly standardized domains (such as molecular representation), a good framework can greatly accelerate application development.&lt;/p&gt;

&lt;p&gt;A good frameworks stays within its problem domain. One of the most important reasons is to prevent &lt;a href="http://headrush.typepad.com/creating_passionate_users/2005/06/featuritis_vs_t.html"&gt;featuritis&lt;/a&gt;, the root of much software evil. Keeping a framework focused on its core mission makes it much more likely that it can remain documented, tested, extensible, and efficient.&lt;/p&gt;

&lt;p&gt;By intention, a variety of features fall outside Octet's problem domain and so will never be directly supported. For example, rendering 2-D structure diagrams is a common problem in cheminformatics that has nothing to do with solving the molecular representation problem. Similarly, reading and writing SMILES strings and Molfiles are supported by many toolkits, but not by Octet directly. After all, it's the inherent limitations of these languages that Octet is trying to overcome.&lt;/p&gt;

&lt;p&gt;Higher-level functionality such as legacy language support and 2-D rendering, although not part of Octet itself, can be developed with Octet as a foundation. For example, two Octet add-on frameworks specifically address these problems. They are called &lt;a href="http://sf.net/projects/rxf/"&gt;Rosetta&lt;/a&gt; and &lt;a href="http://sf.net/proejects/structure/"&gt;Structure&lt;/a&gt;, respectively.&lt;/p&gt;

&lt;h4&gt;About This Series&lt;/h4&gt;

&lt;p&gt;This article is the first in a series discussing Octet. Future articles will describe in detail Octet's design, implementation, and use. Although Octet has come a long way, it's far from finished. My motivation for writing these articles is to hear what you have to say about Octet, so please feel free to &lt;a href="http://sourceforge.net/users/r_apodaca/"&gt;contact me&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Although Octet is written in Java, code examples discussed here will be written in Ruby. I've taken the same approach in discussing the &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK) and &lt;a href="http://depth-first.com/articles/2006/08/28/drawing-2-d-structures-with-structure-cdk"&gt;Structure-CDK&lt;/a&gt;. Ruby's brevity and comfortable syntax make it ideal for both writing and discussing code.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubyforge.org/projects/rjb/"&gt;Ruby Java Bridge&lt;/a&gt; (RJB) is the magic technology that makes this possible. Previous articles have discussed the installation and use of RJB on &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;Windows&lt;/a&gt; and &lt;a href="http://depth-first.com/articles/2006/08/26/scripting-java-libraries-with-ruby-java-bridge"&gt;Linux&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;A Simple Test&lt;/h4&gt;

&lt;p&gt;Assuming you've installed Ruby, RubyGems and Ruby Java Bridge, you can perform a simple demonstration of Octet in Ruby:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;BasicMoleculeBuilder&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;net.sf.octet.builder.BasicMoleculeBuilder&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
&lt;span class="constant"&gt;RepresentationKit&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;net.sf.octet.util.RepresentationKit&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
&lt;span class="constant"&gt;MoleculeKit&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;net.sf.octet.util.MoleculeKit&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
&lt;span class="constant"&gt;System&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;java.lang.System&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;

&lt;span class="ident"&gt;builder&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;BasicMoleculeBuilder&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

&lt;span class="constant"&gt;RepresentationKit&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;buildHexane&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;builder&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

&lt;span class="ident"&gt;molecule&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;builder&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;releaseMolecule&lt;/span&gt;

&lt;span class="constant"&gt;MoleculeKit&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;printMolecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;molecule&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;System&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;out&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

The above code generates an Octet representation for n-hexane, and prints the representation to the console. To run this example, save the above code to a file called &lt;strong&gt;test.rb&lt;/strong&gt; in your working directory. Then add &lt;strong&gt;octet-0.8.2.jar&lt;/strong&gt;, which can be found in the &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=96108&amp;package_id=102647&amp;release_id=378955"&gt;Octet-0.8.2 source distribution&lt;/a&gt;, to the same directory. The test can then be run with the following sequence of commands:

&lt;div class="console"&gt;
&lt;pre&gt;
$ export CLASSPATH=./octet-0.8.2.jar
$ ruby test.rb
**Molecule Properties**

Atom Count: 6, Bonding System Count: 5

Atoms:
atom: C[0] (2nu 0e, 0or, 0.0fc, 1bs, 1n, 4.0val, 3ih )
atom: C[1] (2nu 0e, 0or, 0.0fc, 2bs, 2n, 4.0val, 2ih )
atom: C[2] (2nu 0e, 0or, 0.0fc, 2bs, 2n, 4.0val, 2ih )
atom: C[3] (2nu 0e, 0or, 0.0fc, 2bs, 2n, 4.0val, 2ih )
atom: C[4] (2nu 0e, 0or, 0.0fc, 2bs, 2n, 4.0val, 2ih )
atom: C[5] (2nu 0e, 0or, 0.0fc, 1bs, 1n, 4.0val, 3ih )

No non-natural isotopic distributions specified.

No Orbitals specified.

Bonding Systems:
bonding system:  ( 2be, 0abe, 2a, 1ap ) [ (0, 1) ]
bonding system:  ( 2be, 0abe, 2a, 1ap ) [ (1, 2) ]
bonding system:  ( 2be, 0abe, 2a, 1ap ) [ (2, 3) ]
bonding system:  ( 2be, 0abe, 2a, 1ap ) [ (3, 4) ]
bonding system:  ( 2be, 0abe, 2a, 1ap ) [ (4, 5) ]

Atom Pairs:
atom pair: (0, 1) (1.0 bo)
atom pair: (1, 2) (1.0 bo)
atom pair: (2, 3) (1.0 bo)
atom pair: (3, 4) (1.0 bo)
atom pair: (4, 5) (1.0 bo)

No Atomic Configurations specified.
No Conformation specified.
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As you can see, Octet shares the same concepts and vocabulary as &lt;a href="http://depth-first.com/articles/tag/flexmol"&gt;FlexMol&lt;/a&gt;. We'll drill down into the meaning of the output in later articles. The important thing to remember is that we can print out a report like the one above for any &lt;tt&gt;Molecule&lt;/tt&gt;, no matter how complex.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Octet is an object-oriented framework designed to solve the molecular representation problem and serve as a solid foundation for a variety of cheminformatics applications. Of course, there's much more to Octet than the simple example shown here. Future articles will describe in greater detail the design and use of Octet through illustrative examples.&lt;/p&gt;</description>
      <pubDate>Tue, 30 Jan 2007 14:45:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:a53f05f2-aafb-4d26-a345-fdbfc6e9d724</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/01/30/an-object-oriented-framework-for-molecular-representation-getting-started-with-octet</link>
      <category>Tools</category>
      <category>octet</category>
      <category>flexmol</category>
      <category>representation</category>
      <category>java</category>
      <category>ruby</category>
      <category>inchi</category>
      <category>cml</category>
      <category>molfile</category>
      <category>smiles</category>
      <category>framework</category>
    </item>
    <item>
      <title>Hacking Molbank: Downloading a Complete Chemistry Journal</title>
      <description>&lt;p&gt;&lt;a href="http://www.mdpi.org/"&gt;&lt;img src="http://depth-first.com/files/mdpi-small.gif" border="0" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The previous article in this series highlighted Molbank as a tool for studying the &lt;a href="http://depth-first.com/articles/2006/11/30/molbank-and-the-convergence-of-open-access-open-data-and-open-source-in-chemistry"&gt;convergence of Open Access, Open Data, and Open Source in chemistry&lt;/a&gt;. This article will outline some of the technical and legal aspects of downloading and using Molbank content.&lt;/p&gt;

&lt;h4&gt;Mirror, Mirror&lt;/h4&gt;

&lt;p&gt;MDPI themselves &lt;a href="http://mdpi.net/MIRRORING/mirroring.html"&gt;actively encourage&lt;/a&gt; the copying of their journal content by a process known as mirroring:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;We encourage two types of mirroring :&lt;/p&gt;

    &lt;ul&gt;
    &lt;li&gt;Institutional Mirroring : Institutions may help not only their own members, but neighbouring scientists, to have a faster and reliable access to MDPI journals. For institutions, this is a tradeoff : they save bandwidth on outgoing traffic, while having more inbound traffic. One positive aspect is that sites supporting mirrors become more visited and better known. We are going to maintain a list of supporting institutional mirror sites which is going to be presented in an extremely visible fashion, on the welcome pages of each journal, so that all MDPI readers can access the nearest site.&lt;/li&gt;
    &lt;li&gt;Personnal Mirroring : With hard disks becoming larger and cheaper, it becomes not unreasonnable to set up his/her own personnal mirror, with all the information at your fingertips !. An automated procedure, running at night, keeps your personnal mirror always updated. This is extremely convenient. You may keep this mirror to yourself, or openned to your colleagues, you may do what you wish !&lt;/li&gt;
    &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;The text then goes on to give explicit instructions on how to create a mirror of the entire MDPI site and all of its journal content using Linux. So not only does MDPI explicitly allow the non-commercial copying of their content, but that copy can then be hosted on the Web, transmitted through other media, or simply used locally. It's the latter of these uses that this article will address.&lt;/p&gt;

&lt;h4&gt;Create a Molbank Archive&lt;/h4&gt;

&lt;p&gt;The Unix command &lt;tt&gt;wget&lt;/tt&gt; can be used to copy the content of any website. Before using &lt;tt&gt;wget&lt;/tt&gt;, or any similar tool, you should &lt;a href="http://depth-first.com/articles/2006/09/22/hacking-pubchem-why-the-open-access-fight-is-just-the-beginning"&gt;check the &lt;tt&gt;robots.txt&lt;/tt&gt; file&lt;/a&gt; for the site of interest. I have so far been unable to find a &lt;tt&gt;robots.txt&lt;/tt&gt; file on the MDPI site, so I assume there is no problem with running either &lt;tt&gt;wget&lt;/tt&gt; or other robotic agents. But for the purposes of this tutorial, it is more convenient to create a local copy.&lt;/p&gt;

&lt;p&gt;To create a local copy of all 2005 articles in Molbank, for example, use &lt;tt&gt;wget&lt;/tt&gt; with the appropriate arguments:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ wget -r -l2 http://www.mdpi.net/molbank/molbank2005.htm
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The &lt;tt&gt;-r&lt;/tt&gt; flag turns on recursive directory retrieval, and the &lt;tt&gt;-l2&lt;/tt&gt; flag sets the retrieval depth to two.&lt;/p&gt;

&lt;p&gt;When the process is complete, you should have a directory called &lt;strong&gt;www.mdpi.net&lt;/strong&gt; in your working directory. This directory will contain a subdirectory called &lt;strong&gt;molbank&lt;/strong&gt; which in turn contains two directories: &lt;strong&gt;2005&lt;/strong&gt; and &lt;strong&gt;2006&lt;/strong&gt;. Under the &lt;strong&gt;2005&lt;/strong&gt; directory, you'll find all of Molbank's articles in HTML format, all images, and all molfiles. It's not clear to me yet why the &lt;strong&gt;2006&lt;/strong&gt; directory is created and why it only contains one article.&lt;/p&gt;

&lt;h4&gt;Checking the Archive&lt;/h4&gt;

&lt;p&gt;A large number of Molbank's molfiles appear to be corrupted. This isn't related to &lt;tt&gt;wget&lt;/tt&gt;, because these files are also corrupted when viewed through a browser directly from &lt;a href="http://www.mdpi.org"&gt;http://www.mdpi.org&lt;/a&gt;. For example, the molfile for Molbank article #393 appears corrupted (as do all of the other molfiles for July 2005):&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.mdpi.org/molbank/molbank2005/m393.mol"&gt;http://www.mdpi.org/molbank/molbank2005/m393.mol&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You'll also find several instances of bogus molfiles containing only one or two atoms, such as for Molbank article #431:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.mdpi.org/molbank/molbank2005/m431.mol"&gt;http://www.mdpi.org/molbank/molbank2005/m431.mol&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Some molfiles are missing altogether, such as the one for Molbank article #405:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.mdpi.org/molbank/molbank2005/m405.mol"&gt;http://www.mdpi.org/molbank/molbank2005/m405.mol&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clearly, the integrity of Molbank's molfiles can not be assumed. Software designed to work with this dataset will therefore need to be capable of gracefully handling corrupted, nonexistent, and bogus molfiles.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Molbank permits the non-profit copying of its entire article collection. With some simple command-line tools, it's possible to quickly and easily create your own personal Molbank mirror. A cursory examination of the molfiles contained in Molbank showed several problems that need to be taken into consideration. The remaining articles in this series will describe some ways that Molbank's content can be put to use with Open Source software, and mashed up with Open Data.&lt;/p&gt;</description>
      <pubDate>Fri, 01 Dec 2006 15:13:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:c3ddc1b1-2497-414b-89d4-afbfc6fa38e6</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/12/01/hacking-molbank-downloading-a-complete-chemistry-journal</link>
      <category>Tools</category>
      <category>molbank</category>
      <category>wget</category>
      <category>molfile</category>
      <category>mirror</category>
    </item>
    <item>
      <title>Molbank and the Convergence of Open Access, Open Data, and Open Source in Chemistry</title>
      <description>&lt;p&gt;&lt;a href="http://www.mdpi.org/"&gt;&lt;img src="http://depth-first.com/files/mdpi-small.gif" border="0" align="right"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://www.mdpi.org/molbank/"&gt;Molbank&lt;/a&gt;, published by &lt;a href="http://www.mdpi.org/"&gt;Molecuar Diversity Preservation International&lt;/a&gt;, is one of the oldest of a handful of &lt;a href="http://depth-first.com/articles/2006/10/18/disruptive-innovation-in-scientific-publishing-directory-of-open-access-journals"&gt;Open Access journals in chemistry&lt;/a&gt;. Although its longevity is a remarkable accomplishment in itself, there is much more to Molbank than meets eye. Just below the surface is a feature so revolutionary, yet simple, that chemistry publishers years from now will wonder why &lt;em&gt;they&lt;/em&gt; didn't implement it sooner.&lt;/p&gt;

&lt;p&gt;A Molbank article consists of a short monograph on a single compound, or possibly two. This may strike some scientists as a strange way to publish results, and it is unusual. On the other hand, this system offers vast potential to capture useful, but "unpublishable" findings that would otherwise be lost. Back when scientists actually read hardcopy journals, such a system would never have been feasible. Today, with hard drive space measured in terabytes, fiber optics cables crisscrossing the planet, Internet connectivity for almost everyone, and servers that can be had for virtually nothing, this system not only looks perfectly feasible, but preferable in many ways to the status quo.&lt;/p&gt;

&lt;p&gt;Here's the revolutionary part: each article that Molbank publishes is accompanied by a publicly-available, machine-readable file encoding the structure of the article's subject molecule. That's it. There's nothing tricky or high-tech about it. In fact, the practice is about as low-tech as you could imagine. The file format in which structures are encoded, molfile, dates back at least fifteen years, and nearly every piece of chemistry software - both end-user and developer tools - can handle it. What makes Molbank's practice revolutionary is that not a single chemistry journal, Open Access or subscription-based, currently does this.&lt;/p&gt;

&lt;p&gt;Why does the simple inclusion of a publicly-available molfile encoding molecular structures in a paper matter so much? This is where the second two entities of the trinity named in this article's title come into play: Open Source and Open Data. By providing a mechanism for a computer to decipher the chemistry in a paper, Molbank has opened the door to a host of highly-productive integration activities that nobody outside of &lt;a href="http://www.cas.org/"&gt;Chemical Abstract Service&lt;/a&gt; has even been able to contemplate, let alone prepare for.&lt;/p&gt;

&lt;p&gt;This article is the first in a series aimed at exploring the wide-open space that Molbank has created. Rather than arguing my point with words, I'll actually build working demonstrations of what is now easily within reach. At the same time, I'll document my work on this blog. I'm not sure where all of this will end up, but I do hope to shine some light on a vital, although currently obscure, component of the Open Access debate.&lt;/p&gt;</description>
      <pubDate>Thu, 30 Nov 2006 15:01:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:0ec69fe1-07ac-46d0-9112-95afd038e81f</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/11/30/molbank-and-the-convergence-of-open-access-open-data-and-open-source-in-chemistry</link>
      <category>Open X</category>
      <category>opensource</category>
      <category>opendata</category>
      <category>openaccess</category>
      <category>mdpi</category>
      <category>molbank</category>
      <category>integration</category>
      <category>molfile</category>
    </item>
    <item>
      <title>Drawing 2-D Structures with Structure-CDK</title>
      <description>&lt;p&gt;Rendering 2-D molecular structures is a fundamental part of chemical informatics. It's used in building end user systems, and more immediately, it can be critical for creating and debugging developer tools.&lt;/p&gt;

&lt;p&gt;The &lt;a href="http://cdk.sf.net"&gt;Chemistry Development Kit&lt;/a&gt; (CDK) is a highly-functional chemical informatics library written in Java. Although it provides built-in 2-D rendering capabilities through the &lt;tt&gt;org.openscience.cdk.renderer&lt;/tt&gt; package, I wanted something a little easier for me to customize. The result is &lt;a href="http://structure.sf.net"&gt;Structure-CDK&lt;/a&gt;, a 37K add-on library for the CDK. This article discusses the main features of Structure-CDK with some screenshots and code.&lt;/p&gt;

&lt;p&gt;To begin using Structure-CDK, &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=103744&amp;amp;package_id=202103&amp;amp;release_id=443008"&gt;download&lt;/a&gt; the current release. This package contains a complete copy of the most recent CDK release, so there is nothing else to install or download. Structure-CDK was developed with JDK-1.5.0. Because it contains no 1.5-specific features, it may work on earlier Java versions. &lt;a href="http://ant.apache.org/"&gt;Ant&lt;/a&gt; is useful, but not essential.&lt;/p&gt;

&lt;p&gt;The packages contains an interactive viewing application, which can be invoked with the "vis" Ant task:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ ant vis
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Two types of molecules can be viewed. The first consists of those defined in &lt;tt&gt;org.openscience.cdk.templates.MoleculeFactory&lt;/tt&gt;, which can be found under the &lt;strong&gt;Structure&lt;/strong&gt; menu. 2-D coordinates are provided by CDK's &lt;tt&gt;StructureDiagramGenerator&lt;/tt&gt;. Additionally, molecules can be opened as molfiles (&lt;strong&gt;File-&gt;Open&lt;/strong&gt;), several samples of which are contained in the distribution's &lt;strong&gt;molfiles&lt;/strong&gt; directory. Let's take a look at oseltamivir (Tamiflu).&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/files/oseltamivir_window.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;This view can be changed in a couple of ways. Resizing the window automatically resizes and centers the image, while maintaining proportionality of all measurements. This feature, when used with antialiasing, results in the image staying readable regardless of its size. Additionally, &lt;strong&gt;Edit-&gt;Preferences&lt;/strong&gt; produces a dialog for changing the rendering settings.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/files/oseltamivir_window_prefs.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Now let's see some code that will read a molecule from a molfile and write a 2-D PNG image to disk. This can be done via the static convenience methods found in &lt;tt&gt;ImageKit&lt;/tt&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_java "&gt;import java.io.FileReader;

import org.openscience.cdk.io.MDLReader;
import org.openscience.cdk.interfaces.IMolecule;
import org.openscience.cdk.Molecule;

import net.sf.structure.cdk.util.ImageKit;
...

public void writePNG(String pathToMolfile, String pathToPNG) throws Exception
{
  MDLReader mdlReader = new MDLReader(new FileReader(pathToMolfile));
  IMolecule mol = (IMolecule) mdlReader.read(new Molecule());

  ImageKit.writePNG(mol, 300, 300, pathToPNG);
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above code fragment creates a 300x300 PNG image from the contents of the molfile specified by &lt;tt&gt;pathToMolfile&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;Although several rendering features, both aesthetic and functional, are supported, some are missing. Most importantly, atom labels are only rendered without hydrogen atoms and there is no stereochemistry support. Performance has not been optimized at all. Future versions of Structure-CDK will be aimed at addressing these issues.&lt;/p&gt;

&lt;p&gt;Given the central nature of 2-D structure rendering, it's nice to have options. Structure-CDK provides a convenient, interactive solution. Future articles will discuss the integration of Structure-CDK into more complex chemical informatics systems.&lt;/p&gt;</description>
      <pubDate>Mon, 28 Aug 2006 14:03:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:f1d06b2d-cf86-407c-bf5e-0e3b74e150fa</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/08/28/drawing-2-d-structures-with-structure-cdk</link>
      <category>Graphics</category>
      <category>cdk</category>
      <category>2d</category>
      <category>render</category>
      <category>molfile</category>
      <category>java</category>
    </item>
  </channel>
</rss>
