<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag pubchem</title>
    <link>http://depth-first.com/articles/tag/pubchem</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>PubChem WTF #1</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=21022536"&gt;&lt;img src="http://depth-first.com/demo/20081010/21022536.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;In preparation for the first beta release of &lt;a href="http://metamolecular.com/chemphoto"&gt;ChemPhoto&lt;/a&gt;, the &lt;a href="http://depth-first.com/articles/2008/09/08/smarter-cheminformatics-from-sd-file-to-image-collection-with-chemphoto"&gt;chemical structure imaging application&lt;/a&gt;, I've been performing a lot of tests with &lt;a href="http://depth-first.com/articles/2006/09/29/hacking-pubchem-direct-access-with-ftp"&gt;PubChem SD files&lt;/a&gt;. It turns out that having a tool that can be used to quickly browse through tens of thousands of PubChem molecules turns up some very strange beasts, including &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=21022536"&gt;the one depicted above&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you're still curious as to what this PubChem record is actually referring to, &lt;a href="http://www.daylight.com/daycgi/depict?433143324333433443354343364335374334384333394332324331433143323243393343383443373543364336433535433434433333433232433143314332324333334334344335354336433643353543343443333343323243314331433232433333433434433535433643364335354334344333334332324331433143323243333343343443353543364336433535433434433333433232433143314332324333334334344335354336433643353543343443333343323243314343324333433443354336"&gt;this tool&lt;/a&gt; is quite useful.&lt;/p&gt;</description>
      <pubDate>Sat, 11 Oct 2008 03:35:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:97c12357-2edc-4353-867f-509714551267</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/11/pubchem-wtf-1</link>
      <category>Meta</category>
      <category>pubchem</category>
      <category>pubchemwtf</category>
      <category>chemphoto</category>
      <category>sdfile</category>
    </item>
    <item>
      <title>Recombining Compressed PubChem SD Files with Open Babel</title>
      <description>&lt;p&gt;&lt;a href="http://openbabel.org"&gt;&lt;img src="http://depth-first.com/files/Babel256.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;While testing &lt;a href="http://metamolecular.com/chemphoto"&gt;ChemPhoto&lt;/a&gt;, it became necessary to test the &lt;a href="http://depth-first.com/articles/2008/09/08/smarter-cheminformatics-from-sd-file-to-image-collection-with-chemphoto"&gt;chemical structure imaging application&lt;/a&gt; with SD Files containing several hundred thousand records. Although it's tempting to meet this need by constructing "dummy" files with the same record or small set of records repeated, tests are always far more illuminating when real data is used.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; is an excellent source of large molecular datasets, and the entire database can be &lt;a href="http://depth-first.com/articles/2006/09/29/hacking-pubchem-direct-access-with-ftp"&gt;downloaded by FTP&lt;/a&gt;. Because of PubChem's massive size, what's downloadable consists of files broken up into groups of about 25,000 in gzipped SD File format (*.sdf.gz). Although this is an excellent resource, it creates a problem: how can you conveniently recombine this set of compressed SD Files into a single SD File?&lt;/p&gt;

&lt;p&gt;You might think about writing some "quick" code in your language of choice. Fortunately, &lt;a href="http://openbabel.org"&gt;Open Babel&lt;/a&gt; gets the job done - without any of the coding or debugging.&lt;/p&gt;

&lt;p&gt;The following command will create a single SD File from all of the compressed SD Files in a given directory, while also stripping explicit hydrogens and removing all fields except PUBCHEM_COMPOUND_CID.&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
babel *.sdf.gz pubchem.sdf -d --delete PUBCHEM_COMPOUND_CANONICALIZED,PUBCHEM_CACTVS_COMPLEXITY,PUBCHEM_CACTVS_HBOND_ACCEPTOR,PUBCHEM_CACTVS_HBOND_DONOR,PUBCHEM_CACTVS_ROTATABLE_BOND,PUBCHEM_CACTVS_SUBSKEYS,PUBCHEM_IUPAC_OPENEYE_NAME,PUBCHEM_IUPAC_CAS_NAME,PUBCHEM_IUPAC_NAME,PUBCHEM_IUPAC_SYSTEMATIC_NAME,PUBCHEM_IUPAC_TRADITIONAL_NAME,PUBCHEM_NIST_INCHI,PUBCHEM_EXACT_MASS,PUBCHEM_MOLECULAR_FORMULA,PUBCHEM_MOLECULAR_WEIGHT,PUBCHEM_OPENEYE_CAN_SMILES,PUBCHEM_OPENEYE_ISO_SMILES,PUBCHEM_CACTVS_TPSA,PUBCHEM_MONOISOTOPIC_WEIGHT,PUBCHEM_TOTAL_CHARGE,PUBCHEM_HEAVY_ATOM_COUNT,PUBCHEM_ATOM_DEF_STEREO_COUNT,PUBCHEM_ATOM_UDEF_STEREO_COUNT,PUBCHEM_BOND_DEF_STEREO_COUNT,PUBCHEM_BOND_UDEF_STEREO_COUNT,PUBCHEM_ISOTOPIC_ATOM_COUNT,PUBCHEM_COMPONENT_COUNT,PUBCHEM_CACTVS_TAUTO_COUNT,PUBCHEM_BONDANNOTATIONS,PUBCHEM_CACTVS_XLOGP

865543 molecules converted
7 info messages 15372962 audit log messages 
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Apparently, there is no way to tell babel to &lt;em&gt;keep&lt;/em&gt; just a particular field in an SD File - they need to be removed individually.&lt;/p&gt;

&lt;p&gt;Still, not bad for a few seconds on the command line.&lt;/p&gt;</description>
      <pubDate>Wed, 01 Oct 2008 01:25:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:725a5f70-77e1-4aee-a79d-e7fb9f7c3401</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/10/01/recombining-compressed-pubchem-sd-files-with-open-babel</link>
      <category>Tools</category>
      <category>openbabel</category>
      <category>sdfile</category>
      <category>pubchem</category>
      <category>sdfgz</category>
      <category>commandline</category>
    </item>
    <item>
      <title>Imaging Chemical Structures with ChemPhoto: WYSIWYG Drawing Settings</title>
      <description>&lt;p&gt;&lt;a href="http://metamolecular.com/chemphoto"&gt;&lt;img src="http://depth-first.com/demo/20080908/chemphoto.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Depending on the audience and medium, chemical structures can be presented in a variety of styles. &lt;a href="http://depth-first.com/articles/2008/09/08/smarter-cheminformatics-from-sd-file-to-image-collection-with-chemphoto"&gt;Chemical structure imaging applications&lt;/a&gt; should make it easy to visually and/or numerically arrive at the best appearance. &lt;a href="http://metamolecular.com/chemphoto"&gt;ChemPhoto&lt;/a&gt; makes it easy to get exactly the right look for your structures through what-you-see-is-what-you-get (WYSIWYG) drawing settings.&lt;/p&gt;

&lt;p&gt;The screenshots below illustrate the three main categories of drawing settings in ChemPhoto: Atoms; Bonds; and Images. As each setting is manipulated, the entire view is updated in real-time to reflect the changes. A set of changes can be rolled back by pressing the "Cancel" button, making it easy to undo unwanted modifications.&lt;/p&gt;

&lt;h4&gt;Turquoise Theme with Atoms Tab&lt;/h4&gt;

&lt;p&gt;&lt;center&gt;&lt;a href="http://depth-first.com/demo/20080911/atoms_tab_large.png"&gt;&lt;img src="http://depth-first.com/demo/20080911/atoms_tab_small.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h4&gt;Console Theme with Bonds Tab&lt;/h4&gt;

&lt;p&gt;&lt;center&gt;&lt;a href="http://depth-first.com/demo/20080911/bonds_tab_large.png"&gt;&lt;img src="http://depth-first.com/demo/20080911/bonds_tab_small.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h4&gt;Blueprint Theme with Images Tab&lt;/h4&gt;

&lt;p&gt;&lt;center&gt;&lt;a href="http://depth-first.com/demo/20080911/images_tab_large.png"&gt;&lt;img src="http://depth-first.com/demo/20080911/images_tab_small.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;To try an alpha version of ChemPhoto for yourself,  &lt;a href="http://mailhide.recaptcha.net/d?k=01R9bxyP6XNdc0duoUCzBBHA==&amp;amp;c=vZ7R0VDctRzIRzbSs1-LZwDzjTjAnfCS4KONqGHxY9I=" onclick="window.open('http://mailhide.recaptcha.net/d?k=01R9bxyP6XNdc0duoUCzBBHA==&amp;amp;c=vZ7R0VDctRzIRzbSs1-LZwDzjTjAnfCS4KONqGHxY9I=', '', 'toolbar=0,scrollbars=0,location=0,statusbar=0,menubar=0,resizable=0,width=500,height=300'); return false;" title="Reveal this e-mail address"&gt;drop me a line&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Thu, 11 Sep 2008 15:44:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:1a4c8370-26ae-4930-9154-daf76b0db004</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/09/11/imaging-chemical-structures-with-chemphoto-wysiwyg-drawing-settings</link>
      <category>Tools</category>
      <category>chemphoto</category>
      <category>chemicalstructureimaging</category>
      <category>imaging</category>
      <category>java</category>
      <category>pubchem</category>
      <category>swing</category>
      <category>screenshots</category>
      <category>wysiwyg</category>
    </item>
    <item>
      <title>ChemPhoto Screenshots: Appearance of Structures and Browsing Large Collections</title>
      <description>&lt;p&gt;Chemical structure imaging software solves the problem of how to easily create large numbers of readable chemical structures in a variety of formats automatically. &lt;a href="http://metamolecular.com/chemphoto"&gt;ChemPhoto&lt;/a&gt; was recently &lt;a href="http://depth-first.com/articles/2008/09/08/smarter-cheminformatics-from-sd-file-to-image-collection-with-chemphoto"&gt;introduced&lt;/a&gt; as what appears to be the first chemical structure imaging application. With development of the ChemPhoto user interface now in full-swing, it's possible to show some screenshots.&lt;/p&gt;

&lt;p&gt;Below are two screenshots in which around 25,000 structures from &lt;a href="http://depth-first.com/articles/2006/09/29/hacking-pubchem-direct-access-with-ftp"&gt;PubChem&lt;/a&gt; have been loaded.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;a href="http://depth-first.com/demo/20080910/chemphoto_large_black.png"&gt;&lt;img src="http://depth-first.com/demo/20080910/chemphoto_small_black.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;a href="http://depth-first.com/demo/20080910/chemphoto_large_white.png"&gt;&lt;img src="http://depth-first.com/demo/20080910/chemphoto_small_white.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;If you're interested in trying an alpha version ChemPhoto, feel free to &lt;a href="http://mailhide.recaptcha.net/d?k=01R9bxyP6XNdc0duoUCzBBHA==&amp;amp;c=vZ7R0VDctRzIRzbSs1-LZwDzjTjAnfCS4KONqGHxY9I=" onclick="window.open('http://mailhide.recaptcha.net/d?k=01R9bxyP6XNdc0duoUCzBBHA==&amp;amp;c=vZ7R0VDctRzIRzbSs1-LZwDzjTjAnfCS4KONqGHxY9I=', '', 'toolbar=0,scrollbars=0,location=0,statusbar=0,menubar=0,resizable=0,width=500,height=300'); return false;" title="Reveal this e-mail address"&gt;drop me a line&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Wed, 10 Sep 2008 14:55:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:8de96f1e-bfbc-4574-8b26-7dffd06f71f3</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/09/10/chemphoto-screenshots-appearance-of-structures-and-browsing-large-collections</link>
      <category>Tools</category>
      <category>chemphoto</category>
      <category>imaging</category>
      <category>chemicalstructureimaging</category>
      <category>java</category>
      <category>swing</category>
      <category>pubchem</category>
    </item>
    <item>
      <title>Small Molecule 3D Coordinates From PubChem</title>
      <description>&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;The PubChem team has quietly introduced a new feature - 3D coordinates for many of the small molecules in its compound collection. To my knowledge, these coordinates are only currently &lt;a href="ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/"&gt;available via FTP&lt;/a&gt;. From the &lt;a href="ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/README"&gt;README&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;The data contained here consists of a theoretical 3D description of PubChem Compound records computed using the MMFF94s force field without coulombic terms, including MMFF charges.  Each provided theoretical 3D conformer is not a stationary point on the hyper-potential surface (i.e., is not at a minimum energy).  Rather, the theoretical 3D description is a low energy conformer selected from a conformer model (a theoretical description of the conformational flexibility of a chemical structure consisting of multiple 3D representations or poses sampled using an RMSD {root mean squared distance} threshold) describing energetically-accessible and (potentially) biologically relevant coformations of a chemical structure.&lt;/p&gt;
    
    &lt;p&gt;Not every PubChem Compound record will have a theoretical 3D description. Structures considered too large (containing more than 50 non-hydrogen atoms) or too flexible (containing more than 15 rotatable bonds) are excluded.  Furthermore, chemical structures containing elements other than H, C, N, O, F, P, S, Cl, Br, and I are also excluded.&lt;/p&gt;
    
    &lt;p&gt;Generation of theoretical 3D descriptions of small molecules is computationally intensive.  As such, some PubChem Compound records may be added at a later time.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;(A few open source packages for &lt;a href="http://depth-first.com/articles/2007/12/12/simple-3d-conformer-generation-with-smi23d"&gt;generating 3D conformers&lt;/a&gt; are also available.)&lt;/p&gt;

&lt;p&gt;Recently, &lt;a href="http://hutchison.chem.pitt.edu/"&gt;Geoff Hutchison&lt;/a&gt; wrote in &lt;a href="http://depth-first.com/articles/2008/05/14/the-daily-molecule-the-wonders-of-chemistry-one-molecule-at-a-time#comment-556"&gt;to suggest&lt;/a&gt; that a potentially useful new feature of &lt;a href="http://chempedia.com"&gt;Chempedia&lt;/a&gt; could be the ability to directly obtain 3D coordinates for a molecule of interest.&lt;/p&gt;

&lt;p&gt;One very economical way to do that would be to use PubChem's 3D dataset. It would also be trivial to display these coordinates as a resizable &lt;a href="http://jmol.sourceforge.net/"&gt;Jmol applet&lt;/a&gt;, in analogy to &lt;a href="http://depth-first.com/articles/2008/05/19/building-chempedia-resizable-structures-with-chemwriter"&gt;Chempedia's recently-added 2D molecule resizing feature&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Of course, there are many other potential uses for the PubChem conformer dataset, especially when applied to Web applications.&lt;/p&gt;</description>
      <pubDate>Fri, 23 May 2008 10:53:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:1fdc7fbf-3af8-4928-9770-668ad24d8df2</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/05/23/small-molecule-3d-coordinates-from-pubchem</link>
      <category>Tools</category>
      <category>chempedia</category>
      <category>3d</category>
      <category>conformer</category>
      <category>pubchem</category>
      <category>smi23d</category>
      <category>ftp</category>
    </item>
    <item>
      <title>Chempedia.net: Mashing Up PubChem and Wikipedia</title>
      <description>&lt;p&gt;&lt;a href="http://chempedia.com"&gt;&lt;img src="http://chempedia.net/images/global/logo.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; and &lt;a href="http://wikipedia.net"&gt;Wikipedia&lt;/a&gt; represent two of the largest open repositories of chemical information in the world. And they complement each other very nicely. PubChem contains mainly low-level chemical structure information whereas Wikipedia contains free-text descriptions of chemical compounds in the form of &lt;a href="http://depth-first.com/articles/2008/04/02/wikipedia-for-cheminformatics-a-simple-web-api-for-finding-cas-numbers-in-compound-monographs"&gt;compound monographs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Both services offer permission and access to copy and reuse their contents. But neither service is, by itself, nearly as useful as it could be.&lt;/p&gt;

&lt;p&gt;Why not mash them up?&lt;/p&gt;

&lt;p&gt;To explore that question my company, &lt;a href="http://metamolecular.com"&gt;Metamolecular, LLC&lt;/a&gt; has launched &lt;a href="http://chempedia.com"&gt;Chempedia&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To my knowledge, Chempedia represents the first publicly-facing database of compounds to incorporate Wikipedia's collection of organic compound monographs. And it's one of the few cheminformatics services to make use of free-text descriptions generated by individual chemists.&lt;/p&gt;

&lt;p&gt;Chempedia has been somewhat selective about the compounds it includes. To date, it has spidered over 2,500 monographs, combining them with over 300,000 of the most interesting compounds from PubChem. Not every Chempedia.net molecule has a monograph, but now there's a tool that can actually make that absence apparent.&lt;/p&gt;

&lt;p&gt;Chempedia is both an experiment and a service. It's immediately useful for anyone in the business of making or doing things with organic molecules. It's created several unexpected moments of "Oh, that's actually a useful molecule!" It also will serve as a platform to test some of the ideas discussed in Depth-First over the last year or so on the advantages of the Web for collaboration in chemistry.&lt;/p&gt;

&lt;p&gt;Stay tuned for more details about how Chempedia was created and some of its applications in chemistry.&lt;/p&gt;</description>
      <pubDate>Fri, 04 Apr 2008 10:06:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:168432fb-c064-43c2-a60d-728c7c29c406</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/04/04/chempedia-net-mashing-up-pubchem-and-wikipedia</link>
      <category>Tools</category>
      <category>chempedia</category>
      <category>wikipedia</category>
      <category>pubchem</category>
      <category>rails</category>
      <category>ruby</category>
      <category>chemwriter</category>
      <category>applet</category>
      <category>java</category>
      <category>jruby</category>
    </item>
    <item>
      <title>Five Open Tools for 2D Structure Layout (aka Structure Diagram Generation)</title>
      <description>&lt;p&gt;&lt;a href="http://metamolecular.com/chemwriter"&gt;&lt;img src="http://depth-first.com/demo/20070411/difficult.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Given a molecular representation without 2D coordinates, how would you display a human-readable view?&lt;/p&gt;

&lt;p&gt;This problem can arise in many situations, one of the most common of which is the parsing of &lt;a href="http://depth-first.com/articles/tag/linenotation"&gt;line notations&lt;/a&gt; such as &lt;a href="http://depth-first.com/articles/2007/10/19/easily-convert-iupac-nomenclature-to-smiles-inchi-or-molfile-with-rubidium"&gt;IUPAC nomenclature&lt;/a&gt;, SMILES, or &lt;a href="http://depth-first.com/articles/2007/10/15/an-introduction-to-the-rubidium-cheminforamtics-toolkit-interconvert-smiles-inchi-and-molfile-with-an-open-babel-like-interface"&gt;InChI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And then there are the cases when you have 2D coordinates, but they're &lt;a href="http://depth-first.com/articles/2008/02/12/the-art-and-science-of-chemical-structure-diagrams-double-trouble"&gt;not very aesthetically pleasing&lt;/a&gt;. Maybe the coordinates were created by people either in a hurry or working with low quality editors, or maybe they were generated as distorted 2D projections of 3D coordinates. Whatever the reason, simply having 2D coordinates may not be the same as having &lt;em&gt;good&lt;/em&gt; 2D coordinates.&lt;/p&gt;

&lt;p&gt;Last year, a Depth-First article &lt;a href="http://depth-first.com/articles/2007/04/11/structure-diagram-generation"&gt;discussed the Structure Diagram Generation (SDG) problem&lt;/a&gt; and how it can be solved with Open Source software. Given that nearly a year has passed, it seemed appropriate to revisit the topic.&lt;/p&gt;

&lt;p&gt;The good news is that there are at least four independent Open Source implementations of SDG algorithms, and one potential open database approach. They are, in no particular order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://sourceforge.net/projects/mcdl"&gt;MCDL&lt;/a&gt; Written in Java, the emphasis of this software appears to be facilitating the use of &lt;a href="http://depth-first.com/articles/2006/08/19/a-first-look-at-modular-chemical-descriptor-language-mcdl"&gt;Modular Chemical Descriptor Language&lt;/a&gt;. Unfortunately, no new releases of this intriguing software package have been made in the last year.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://sf.net/projects/cdk"&gt;Chemistry Development Kit (CDK)&lt;/a&gt; This useful package handles about 70-80% of a typical assortment of chemical structures well. The large amount of activity on the CDK project in general makes this a particularly good SDG system to contribute to, especially in the areas of refactoring and handling special cases. See also &lt;a href="http://www.steinbeck-molecular.de/steinblog/index.php/2007/08/14/structure-diagram-generation-sdg-2d-layout-in-the-chemistry-development-kit-part-1/"&gt;Christoph Steinbeck's overview of CDK's layout system&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://bkchem.zirael.org/"&gt;BKChem&lt;/a&gt; A 2D structure editor written in Python. Give it an InChI and it will display the structure, courtesy of SDG. The system worked remarkably well with the molecules I tested. BKChem has also been reported to work in &lt;a href="http://bkchem.zirael.org/batch_mode_en.html"&gt;batch mode&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.rdkit.org/"&gt;RDKit&lt;/a&gt; Written in Python and C++, this package is the newest of the bunch. Although &lt;a href="http://sourceforge.net/mailarchive/message.php?msg_id=360844.35824.qm%40web34206.mail.mud.yahoo.com"&gt;I haven't had much luck compiling RDKit&lt;/a&gt;, it still looks quite promising. Any chance of switching to &lt;a href="http://www.gnu.org/software/make/"&gt;make&lt;/a&gt; as a build system?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; PubChem? Maybe. With a database of small molecules now numbering well over ten million, there's a good chance that the molecule for which you need to assign coordinates is already in PubChem. And if it's in PubChem, 2D coordinates have already been assigned. Use an InChI as a hash key, and voila - instant SDG without much software. Given the novelty of large, publicly-available databases of small molecules such as PubChem, this approach may have a great deal of untapped potential.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SDG is one of those issues that can stay off the radar for some only to become an instant, nagging problem with no clear way out. The tools cited here offer an excellent place to begin working toward a comprehensive solution.&lt;/p&gt;</description>
      <pubDate>Wed, 26 Mar 2008 09:11:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:5441d3fc-3dc2-4f2d-b740-5cad16dd454b</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/03/26/five-open-tools-for-2d-structure-layout-aka-structure-diagram-generation</link>
      <category>Tools</category>
      <category>sdg</category>
      <category>2d</category>
      <category>mcdl</category>
      <category>cdk</category>
      <category>bkchem</category>
      <category>rdkit</category>
      <category>pubchem</category>
      <category>coordinates</category>
      <category>java</category>
      <category>python</category>
      <category>cplusplus</category>
      <category>layout</category>
    </item>
    <item>
      <title>Create Your Own PubChem Datasets: Exporting Results As SD Files</title>
      <description>&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif"  align="right"&gt;&lt;/img&gt;&lt;/a&gt;Recently, &lt;a href="http://depth-first.com/articles/2007/11/12/parsing-sd-files-with-ruby-and-rubidium"&gt;I needed to create a subset&lt;/a&gt; of the PubChem database in Structure Data File (SD File) format. Although it's far from obvious how to do this, the capability does exist. In this article, I'll give a step-by-step procedure for creating custom datasets in SD File format from arbitrary PubChem structure queries.&lt;/p&gt;

&lt;h4&gt;Create and Execute the Query&lt;/h4&gt;

&lt;p&gt;Let's say we want to create a dataset in SD File format containing all N-Boc-protected piperidines registered in PubChem.&lt;/p&gt;

&lt;p&gt;From the main &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem site&lt;/a&gt;, choose the &lt;a href="http://pubchem.ncbi.nlm.nih.gov/search/"&gt;Structure Search&lt;/a&gt; link. Then click the "Sketch" button.&lt;/p&gt;

&lt;p&gt;Next, draw your molecule in the 2D structure editor:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071113/draw.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Then click the "Done" button.&lt;/p&gt;

&lt;p&gt;Before starting the query (by clicking the "Search" button), be sure to select the "Substructure" option under "Search Type."&lt;/p&gt;

&lt;h4&gt;Exporting the Results&lt;/h4&gt;

&lt;p&gt;You should now be looking at a screen containing the first few hits of a 7700+ hitset. But how do we export these results in SD Format?&lt;/p&gt;

&lt;p&gt;Next to a field labeled "Display", you'll see a drop-down box containing several different options. Choose the one labeled "PubChem Download."&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071113/export.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;You'll be redirected to a download page from which you can select output formats, including SDF, or SD File. You can also select a compression type (datasets of even 2000 records can be quite large uncompressed). For this example, we'll select SDF format with GZip compression.&lt;/p&gt;

&lt;p&gt;Clicking on the "Download" button takes us to a status page that eventually informs us when our download has been processed. You should then get a "Save File" dialog or something similar. If not, you should see a link to the compressed SD file.&lt;/p&gt;

&lt;p&gt;Downloading the results file completes the process.&lt;/p&gt;</description>
      <pubDate>Tue, 13 Nov 2007 16:43:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:978ad5ab-d385-4905-abc6-2d9025a601d0</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/13/create-your-own-pubchem-datasets-exporting-results-as-sd-files</link>
      <category>Tools</category>
      <category>pubchem</category>
      <category>sdfile</category>
      <category>dataset</category>
    </item>
    <item>
      <title>PubChem for Newbies</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070926/newbies.png" align="right"&gt;&lt;/img&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; is arguably the most important free repository of information about small molecules on the planet. Although its size is staggering (over 10 million unique compounds), what makes PubChem important is its completely open approach to chemical information. Never before in the history of chemistry has so much information been made available, free to anyone who cares to use it.&lt;/p&gt;

&lt;p&gt;Despite PubChem's pioneering approach, many factors make the service difficult to learn and navigate. Most notably, its practical yet bewildering integration into NIH's other far-flung database activities serve as highly effective camouflage for the treasures that lie beneath.&lt;/p&gt;

&lt;p&gt;With this in mind, I thought it would be useful to collect all Depth-First articles on PubChem into one place. They have been broken down into categories, although many articles contain elements useful to anyone interested in PubChem.&lt;/p&gt;

&lt;h4&gt;For Chemists&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/08/16/five-ways-to-use-pubchem-right-now"&gt;Five Ways to Use PubChem Right Now&lt;/a&gt; Food for thought and grounds for further research.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/05/21/simple-cas-number-lookup-with-pubchem"&gt;Simple CAS Number Lookup with PubChem&lt;/a&gt; Could this be PubChem's killer app?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/01/26/how-to-find-chemical-information-on-the-internet-why-open-source-open-access-and-open-data-matter"&gt;How to Find Chemical Information on the Internet: Why Open Source, Open Access, and Open Data Matter&lt;/a&gt; PubChem is also part of the process.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;For Hackers&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/09/25/hacking-pubchem-visually-inspect-results-for-cas-number-and-keyword-searches"&gt;Hacking PubChem: Visually Inspect Results for CAS Number and Keyword Searches&lt;/a&gt; Radically change the look of PubChem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby"&gt;Hacking PubChem: Convert CAS Numbers into PubChem CIDs with Ruby&lt;/a&gt; Could this be PubChem's killer app?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/06/11/hacking-pubchem-learning-to-speak-pug"&gt;Hacking PubChem: Learning to Speak PUG&lt;/a&gt; I hope you like angle brackets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/08/14/java-interface-to-pubchems-power-user-gateway"&gt;Java Interface to PubChem's Power User Gateway&lt;/a&gt; Taming those angle brackets with Java.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/08/13/the-best-api-may-be-no-api-at-all-pubchem-and-pdb"&gt;The Best API May Be No API at All: PubChem and PDB&lt;/a&gt; Designing the obvious.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/06/04/hacking-pubchem-power-user-gateway"&gt;Hacking PubChem: Power User Gateway&lt;/a&gt; The preferred method to program PubChem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/01/03/open-source-and-open-data-why-we-should-eat-our-own-dogfood"&gt;Open Source and Open Data: Why We Should Eat Our Own Dogfood&lt;/a&gt; Open Source software and open databases should go hand-in-hand, but not in this case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/12/12/the-problem-with-ferrocene"&gt;The Problem With Ferrocene&lt;/a&gt; Big trouble ahead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/09/29/hacking-pubchem-direct-access-with-ftp"&gt;Hacking PubChem: Direct Access With FTP&lt;/a&gt; Cheminformatics' most radical idea in the last 25 years - download all of PubChem onto your hard drive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/09/23/hacking-pubchem-entrez-programming-utilities"&gt;Hacking PubChem: Entrez Programming Utilities&lt;/a&gt; One of the preferred methods to access PubChem programatically.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/09/21/hacking-pubchem-query-by-smiles"&gt;Hacking PubChem: Query By Smiles&lt;/a&gt; Don't try this at home.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/08/30/hacking-pubchem-with-ruby"&gt;Hacking PubChem with Ruby&lt;/a&gt; One of my first attempts to use Ruby for cheminformatics.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;For Everyone&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/07/04/pubchem-is-a-platform"&gt;PubChem is a Platform&lt;/a&gt; Just like the Apple ][ - hardly perfect, but good enough and infinitely hackable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;Thirty-Two Free Chemistry Databases&lt;/a&gt; PubChem in its context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/09/22/hacking-pubchem-why-the-open-access-fight-is-just-the-beginning"&gt;Hacking PubChem: Why the Open Access Fight is Just the Beginning&lt;/a&gt; Just getting an open database is not the end of the story.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/09/27/hacking-pubchem-free-speech-or-free-beer"&gt;Hacking PubChem: Free Speech or Free Beer?&lt;/a&gt; It depends on what the meaning of the word 'is' is.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/09/17/toward-an-open-worldwide-chemical-information-network"&gt;Toward and Open Worldwide Chemical Information Network&lt;/a&gt; Back in 1965, Walter M. Carlson was way ahead of his time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://depth-first.com/articles/2006/08/12/changes"&gt;Changes&lt;/a&gt; My first blog post - inspired, in part, by the launch of PubChem.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've got a favorite PubChem resource you'd like to share, please feel free to leave a comment.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Service Credit: &lt;a href="http://txt2pic.com"&gt;txt2pic.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 26 Sep 2007 08:42:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:76d70c4b-1827-4839-a35c-00c1e62c36ac</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/26/pubchem-for-newbies</link>
      <category>Tools</category>
      <category>pubchem</category>
      <category>newbies</category>
    </item>
    <item>
      <title>Hacking PubChem: Visually Inspect Results for CAS Number and Keyword Searches</title>
      <description>&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;&lt;img src="http://depth-first.com/files/pubchemlogo.gif" align="right"&gt;&lt;/img&gt;&lt;/a&gt;A recent article described how PubChem could be used to &lt;a href="http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby"&gt;quickly search for CAS numbers&lt;/a&gt;. Although useful, the approach is limited in that only an array of PubChem CIDs was returned. What would be really useful would be a simple way to create a report with entries hyperlinked into the PubChem site itself to aid in visual inspection. In this tutorial, we'll see how an HTML template and a few extra lines of code can do just that.&lt;/p&gt;

&lt;h4&gt;The Template&lt;/h4&gt;

&lt;p&gt;Ruby supports a number of HTML templating mechanisms. In this example, we'll use an ERB template resurrected from the &lt;a href="http://depth-first.com/articles/2006/12/11/hacking-molbank-creating-a-graphical-table-of-contents"&gt;Molbank graphical table of contents&lt;/a&gt; tutorial:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;html&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;head&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;title&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;PubChem Search for #{term}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;title&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;head&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;h1&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Search: #{term}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;h1&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;table&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;0&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;cids.each&lt;/span&gt; &lt;span class="attribute"&gt;do&lt;/span&gt; |&lt;span class="attribute"&gt;cid|&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;td&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;image&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?cid=#{cid}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;summary&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=#{cid}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt; &lt;span class="attribute"&gt;href&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&amp;lt;%= summary %&amp;gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;img&lt;/span&gt; &lt;span class="attribute"&gt;src&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&amp;lt;%= image %&amp;gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="attribute"&gt;border&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;2&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;img&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;center&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt; &lt;span class="attribute"&gt;style&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;font-size: 8px&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
              &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt; &lt;span class="attribute"&gt;href&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;&amp;lt;%= summary %&amp;gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&amp;lt;%=&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;CID-#{cid}&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;a&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;center&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;td&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; +&lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;1&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;if&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; &lt;span class="punct"&gt;&amp;gt;&lt;/span&gt; 5 %&amp;gt;
          &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;col&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;0&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="attribute"&gt;end&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;%&lt;/span&gt;&lt;span class="attribute"&gt;end&lt;/span&gt; %&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;tr&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;table&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;body&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;html&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above template uses a search term and an array of CIDs to build a table of results. Each cell in the table contains a color 2D image and the CID, both hyperlinked into PubChem itself.&lt;/p&gt;

&lt;p&gt;Saving this library to a file called &lt;strong&gt;template.rhtml&lt;/strong&gt; is all we need to do.&lt;/p&gt;

&lt;h4&gt;The Library&lt;/h4&gt;

&lt;p&gt;The library is a modification of the one shown in &lt;a href="http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby"&gt;the previous article&lt;/a&gt; in this series:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;mechanize&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;erb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;PubChemTerms&lt;/span&gt;
  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;report&lt;/span&gt; &lt;span class="ident"&gt;term&lt;/span&gt;
    &lt;span class="ident"&gt;cids&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;get_cids&lt;/span&gt; &lt;span class="ident"&gt;term&lt;/span&gt;
    &lt;span class="ident"&gt;erb&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;ERB&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="constant"&gt;IO&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;template.rhtml&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;))&lt;/span&gt;

    &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;open&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;output.html&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;w+&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt; &lt;span class="punct"&gt;|&lt;/span&gt;&lt;span class="ident"&gt;file&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt;
      &lt;span class="ident"&gt;file&lt;/span&gt; &lt;span class="punct"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="ident"&gt;erb&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;result&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;binding&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;get_cids&lt;/span&gt; &lt;span class="ident"&gt;term&lt;/span&gt;
    &lt;span class="ident"&gt;agent&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;WWW&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="constant"&gt;Mechanize&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="ident"&gt;page&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;agent&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;get&lt;/span&gt; &lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&amp;amp;retmax=100&amp;amp;term=&lt;span class="expr"&gt;#{term}&lt;/span&gt;&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt;

    &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parser&lt;/span&gt;&lt;span class="punct"&gt;/&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;).&lt;/span&gt;&lt;span class="ident"&gt;collect&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="ident"&gt;id&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;innerHTML&lt;/span&gt;&lt;span class="punct"&gt;}&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The method &lt;tt&gt;report&lt;/tt&gt; accepts a search term and uses our template to render a report.&lt;/p&gt;

&lt;h4&gt;Testing&lt;/h4&gt;

&lt;p&gt;By saving the above library in a file called &lt;strong&gt;pubchem.rb&lt;/strong&gt;, we can search by keyword via interactive ruby (irb):&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'pubchem'
=&gt; true
irb(main):002:0&gt; include PubChemTerms
=&gt; Object
irb(main):003:0&gt; report 'esomeprazole'
=&gt; #&lt;File:output.html (closed)&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This produces a file called &lt;strong&gt;output.html&lt;/strong&gt; that can be viewed with any browser:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070925/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;As in the original version of the library, we can also query by CAS number:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ irb
irb(main):001:0&gt; require 'pubchem'
=&gt; true
irb(main):002:0&gt; include PubChemTerms
=&gt; Object
irb(main):003:0&gt; report '119141-88-7'
=&gt; #&lt;File:output.html (closed)&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;The simple approach outlined here could be extended in many ways. For example, we could easily retrieve molfiles based on keyword or CAS number search. We could pipe queries together or work with query lists. We could &lt;a href="http://depth-first.com/articles/2007/09/17/hacking-chemspider-query-by-smiles-and-inchi-with-ruby"&gt;blend in ChemSpider data&lt;/a&gt;. We could even build a simple Web application (with &lt;a href="http://rubyonrails.org"&gt;Rails&lt;/a&gt;) that returned customized reports. Mixing in &lt;a href="http://depth-first.com/articles/tag/rcdk"&gt;Ruby CDK&lt;/a&gt; or &lt;a href="http://depth-first.com/articles/tag/rubyopenbabel"&gt;Ruby Open Babel&lt;/a&gt; offers still more possibilities.&lt;/p&gt;

&lt;p&gt;Increasingly, the most important question in cheminformatics is not "What can we build?", but rather "What should we build?" Success in this new world requires a much deeper understanding of how cheminformatics software is being used by real chemists and where it's not.&lt;/p&gt;</description>
      <pubDate>Tue, 25 Sep 2007 10:55:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:5b1ef92b-4ed3-443e-a683-dc37d23c4352</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/25/hacking-pubchem-visually-inspect-results-for-cas-number-and-keyword-searches</link>
      <category>Tools</category>
      <category>pubchem</category>
      <category>casnumber</category>
      <category>cas</category>
      <category>ruby</category>
      <category>keyword</category>
      <category>erb</category>
      <category>html</category>
      <category>entrez</category>
    </item>
  </channel>
</rss>
