<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Category Web</title>
    <link>http://depth-first.com/articles/category/web</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Why Web Development is Hard</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;a href="http://wl1.acdlabs.com/WebLibrarian/index.html"&gt;&lt;img src="http://depth-first.com/demo/20071116/screen.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The very thing you'd like most to do as a developer is the thing your users can't stand.&lt;/p&gt;</description>
      <pubDate>Fri, 16 Nov 2007 09:00:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:17769c0a-5048-4842-aea2-a5e259bf4ff8</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/16/why-web-development-is-hard</link>
      <category>Web</category>
      <category>webdevelopment</category>
    </item>
    <item>
      <title>Name That Graph Revealed: Oligarchy 2.0</title>
      <description>&lt;p&gt;&lt;a href="http://www.mckinseyquarterly.com/article_page.aspx?ar=2041&amp;amp;l2=17&amp;amp;l3=104&amp;amp;srid=17"&gt;&lt;img src="http://depth-first.com/demo/20070905/graph.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Web 2.0 may be &lt;a href="http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html"&gt;all about participation&lt;/a&gt;, but the &lt;a href="http://www.mckinseyquarterly.com/article_page.aspx?ar=2041&amp;amp;l2=17&amp;amp;l3=104&amp;amp;srid=17"&gt;numbers&lt;/a&gt; reported by &lt;a href="http://www.mckinseyquarterly.com"&gt;The McKinsey Quarterly&lt;/a&gt; suggest a self-selecting oligarchy rather than a democracy. Success may well depend more on engaging the top 2-10% of users rather than appealing to all of them. Food for though when forming your next community, be it electronic or otherwise.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;image credit: &lt;a href="http://www.mckinseyquarterly.com"&gt;The McKinsey Quarterly&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 05 Sep 2007 08:28:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:1ba4dc54-4c50-4dbc-bf39-c25abe642998</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/05/name-that-graph-revealed-oligarchy-2-0</link>
      <category>Web</category>
      <category>web20</category>
      <category>participation</category>
      <category>oligarchy</category>
      <category>socialnetworking</category>
    </item>
    <item>
      <title>Can Your Cheminformatics Tool Do This?</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;a href="http://dx.doi.org/10.1021/ol070936d"&gt;&lt;img src="http://depth-first.com/demo/20070613/allenes.gif" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 13 Jun 2007 08:36:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:6e600510-2b9a-46f1-8267-227cffedac99</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/13/can-your-cheminformatics-tool-do-this</link>
      <category>Web</category>
      <category>flexmol</category>
      <category>octet</category>
      <category>axialchirality</category>
      <category>dietz</category>
    </item>
    <item>
      <title>Eleven Qualities of The Perfect Line Notation for the Web</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/wenwennie/396170719/"&gt;&lt;img src="http://depth-first.com/demo/20070314/line.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;If you had to design the perfect line notation for the Web, what would it look like? This is hardly an academic exercise given the central role played by line notations in information systems. For a variety of reasons, existing line notations may not be the right match for the Web. This article explores this question and outlines the main qualities needed by a Web-friendly line notation.&lt;/p&gt;

&lt;h4&gt;A Few Lines About Line Notations&lt;/h4&gt;

&lt;p&gt;A line notation is any system that converts a molecular structure into a single line of text. Chemists have been using line notations for over 140 years - long before the advent of computers. Because of their versatility, line notations are frequently used in situations they were not designed for. When this happens, limitations become apparent, resulting in renewed efforts to build a better system.&lt;/p&gt;

&lt;p&gt;As &lt;a href="http://depth-first.com/articles/2006/08/18/107-years-of-line-formula-notations-1861-1968"&gt;noted previously&lt;/a&gt;, the invention of new line notations is a field whose popularity ebbs and flows over time. Currently, the three most important line notations are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IUPAC Nomenclature&lt;/li&gt;
&lt;li&gt;Simplified Molecular Input Line Entry System (SMILES)&lt;/li&gt;
&lt;li&gt;IUPAC International Chemical Identifier (InChI)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these systems has its own unique characteristics. &lt;a href="http://www.acdlabs.com/iupac/nomenclature/"&gt;IUPAC nomenclature&lt;/a&gt; is the oldest and most widely-used line notation. It appears in numerous contexts, including Web pages, peer-reviewed journals, reports, patents, MSDS sheets, catalogs, and reagent bottles. By comparison, &lt;a href="http://www.daylight.com/smiles/index.html"&gt;SMILES&lt;/a&gt; is a distant second in popularity. It's main role has been to facilitate machine entry of structural information by humans, &lt;a href="http://www.emolecules.com/"&gt;like this&lt;/a&gt;. &lt;a href="http://en.wikipedia.org/wiki/International_Chemical_Identifier"&gt;InChI&lt;/a&gt; is the newest of the bunch. It serves both as a line notation and as a unique identifier requiring no central authority.&lt;/p&gt;

&lt;h4&gt;The Perfect Line Notation for the Web&lt;/h4&gt;

&lt;p&gt;The emergence of the Web as a standard information delivery platform has refocused the attention of many developers on the line notation problem. With this idea in mind, here are some guesses about the qualities of the ideal Web-friendly line notation.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Readily Encodable and Decodable by Humans.&lt;/strong&gt; There's something unnerving about a line notation that can't easily be deciphered by humans. Is this really the right string? Did I copy it completely? This problem surfaces with every line notation, but some fare better than others. IUPAC nomenclature, for example, is one of the first things taught in many beginning organic chemistry classes. It's complicated, but still understandable by non-experts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Readily Encodable and Decodable by Machines.&lt;/strong&gt; It may be relatively simple for humans to read and write IUPAC nomenclature, but not so for machines. Software that reads and writes SMILES, on the other hand, is by comparison easy to write. This explains the abundance of software packages that handle SMILES and the &lt;a href="http://depth-first.com/articles/tag/opsin"&gt;scarcity&lt;/a&gt; of those that handle IUPAC nomenclature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Uses URI-Safe Characters Only.&lt;/strong&gt; A &lt;a href="http://en.wikipedia.org/wiki/Uniform_Resource_Identifier"&gt;URI&lt;/a&gt; uniquely identifies every document on the Internet. Why can't a line notation be used in combination with a URI to uniquely identify every molecule? One reason is that every line notation currently in use contains &lt;a href="http://www.freesoft.org/CIE/RFC/1738/4.htm"&gt;characters unsafe for use in URIs&lt;/a&gt;. Any line notation designed for use on the Web needs to avoid these characters in its syntax. &lt;em&gt;Update: InChI doesn't use unsafe characters, but it does use the reserved characters "=", "?", and "/". These characters may therefore &lt;a href="http://info-uri.info/registry/OAIHandler?verb=GetRecord&amp;amp;metadataPrefix=reg&amp;amp;identifier=info:inchi/"&gt;need to be escaped&lt;/a&gt;, depending on the context.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Encodes All Molecules.&lt;/strong&gt; Buried within every line notation is an opinion on what chemistry is really about. To operate on the Web, these opinions need to be as closely aligned as possible with those of chemists themselves. &lt;a href="http://depth-first.com/articles/tag/flexmol"&gt;Several Depth-First articles&lt;/a&gt; have discussed the limitations of existing line notations as molecular languages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compact.&lt;/strong&gt; Nobody wants to look at or manipulate a line of text that's longer than it needs to be. Of course, the more expressive a line notation is, the more verbose it will be. In other words, qualities 4 and 5 will always be in conflict.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Canonicalizable.&lt;/strong&gt; A line notation supports canonicalization when it specifies rules that can be guaranteed to always generate the same line notation for a given molecule. This feature enables many labor-saving assumptions. For example, a canonical representation makes a great identifier in a database, reducing the cost of storing and retrieving structural information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Explicit Hydrogen Atom Encoding.&lt;/strong&gt; SMILES makes few requirements regarding hydrogen atom encoding. As a result, each software implementation is left to its own devices. The resulting confusion is the price paid for the convenience (Quality 1) of a compact notation (Quality 5).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hierarchical Structure.&lt;/strong&gt; One of InChI's innovations was the introduction of a hierarchical encoding system. This system, also referred to as InChI "layers", enables a molecule to be viewed at several levels of resolution: as a molecular formula; as a network of atoms; as a network of atoms containing hydrogen atoms; as an atomic network with stereochemistry; and so on. I'm unaware of any reports in which this feature has been exploited in a practical way, although they aren't difficult to imagine.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flat Structure.&lt;/strong&gt; By grouping structural features into layers (Quality 8), InChI introduces a lot of complexity that is absent in SMILES and even IUPAC nomenclature. This complexity, in part, makes it difficult for both humans and machines to properly encode InChIs (Qualities 1 and 2). Given this complexity, and the fact that the utility of hierarchical encoding has yet to be conclusively demonstrated, it may be better to avoid it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Open Source Software Implementation.&lt;/strong&gt; No encoding standard in today's world stands a chance of gaining acceptance without an open source reference implementation. InChI broke new ground in this area and should serve as a model for any system that follows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unencumbered by Patents.&lt;/strong&gt; The success of molfile and SMILES as de facto standards derives partly from the decision made by their authors to refrain from patenting their languages. As a result, developers are motivated build their own implementations, rather than invent yet another language.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;A robust and modern line notation system is a key technology for chemically enabling the Web. Existing line notations, although useful in many contexts, were not designed with this particular role in mind. The time has come to consider whether a new line notation system, designed specifically with the Web and modern chemistry in mind, might offer a better solution.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Photo credit: &lt;a href="http://flickr.com/photos/wenwennie/"&gt;Wenwen&lt;/a&gt;  - &lt;a href="http://flickr.com"&gt;Flickr&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 14 Mar 2007 10:18:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:81f8ab71-4155-406b-adfa-2d1fde0c4f6b</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/14/eleven-qualities-of-the-perfect-line-notation-for-the-web</link>
      <category>Web</category>
      <category>inchi</category>
      <category>smiles</category>
      <category>iupac</category>
      <category>linenotation</category>
      <category>web</category>
      <category>uri</category>
    </item>
    <item>
      <title>Why the Web Isn't Ready for Chemistry</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070305/lavoisier.jpg" align="right"&gt;&lt;/img&gt;Wouldn't it be wonderful if chemical structure searching were as easy as using Google? Draw your molecule, press a button and get the good stuff first. That day may well arrive, but without the creation of some key technologies, the wait will be very long. This article describes an unsuccessful attempt to bring the chemically-aware Web closer to reality.&lt;/p&gt;

&lt;h4&gt;Background&lt;/h4&gt;

&lt;p&gt;Recently, I &lt;a href="http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic"&gt;introduced&lt;/a&gt; a small Web application called &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;. It lets you draw a structure and search for it though one of a number of popular search engines.&lt;/p&gt;

&lt;p&gt;InChIMatic turns a molecular query into text, which is then searched. This magic is made possible through the &lt;a href="http://en.wikipedia.org/wiki/International_Chemical_Identifier"&gt;IUPAC International Chemical Identifier&lt;/a&gt; (InChI). InChI has enormous potential for enabling chemical Web searches, but several barriers must be overcome first.&lt;/p&gt;

&lt;p&gt;For example, if you run even the most trivial of queries with &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;, you'll quickly see that search engines have only indexed a small number of InChIs. One reason is that InChIs are not yet widely-used by Web authors. But the deeper problem is that many pages containing InChIs are not indexed by search engines. For example, &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem's&lt;/a&gt; vast collection of InChIs is apparently invisible to Google.&lt;/p&gt;

&lt;p&gt;Compounding the problems of using InChIs to index chemical content on the Web is the lack of a standard, unobtrusive method for embedding the identifier into Web pages. Understandably, no author wants to invest valuable time and effort on an indexing system that doesn't work with their content and page layout. This problem is the subject of the current article.&lt;/p&gt;

&lt;h4&gt;Materials and Methods&lt;/h4&gt;

&lt;p&gt;The &lt;a href="http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic"&gt;InChIMatic article&lt;/a&gt; contained a test for how well Google and "invisible" InChIs might work together. If you mouse over the word "1-bromonaphthalene" in the first paragraph of that article, you'll see a small popup window containing the InChI. I accomplished this effect with the following HTML:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt; &lt;span class="attribute"&gt;title&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;InChI=1/C10H7Br/c11-10-7-3-5-8-4-1-2-6-9(8)10/h1-7H&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
  1-bromonaphthalene
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;My goal wasn't the popup effect. Instead, I wanted to test the &lt;tt&gt;title&lt;/tt&gt; attribute as an unobtrusive vector for getting InChIs indexed by Google. This excellent idea was &lt;a href="https://www2.blogger.com/comment.g?blogID=17889588&amp;amp;postID=9068626890097011632"&gt;a suggestion&lt;/a&gt; made by Oliver Koepler in response to &lt;a href="http://chem-bla-ics.blogspot.com/"&gt;Egon Willighagen's&lt;/a&gt; article on &lt;a href="http://chem-bla-ics.blogspot.com/2007/02/invisible-inchis.html"&gt;invisible InChIs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The idea is simple: InChIs are to be read by machines, not humans. InChIs consist of long strings of text that contain no widely-recognized wrappable characters. As a result, displaying InChIs in Web pages can break page layouts. Even if a wrapping mechanism is used, such as with the &lt;tt&gt;overflow&lt;/tt&gt; attribute, I find InChIs unpleasant to look at and just plain distracting. There's &lt;a href="http://depth-first.com/articles/2006/09/13/the-chemically-aware-web-are-we-there-yet"&gt;no good reason&lt;/a&gt; why any chemist should have to look at them.&lt;/p&gt;

&lt;p&gt;Chemists themselves are, understandably, &lt;a href="http://kinasepro.wordpress.com/2006/12/05/monday-night-ot-2/"&gt;reluctant&lt;/a&gt; to invest in ad hoc methods to index their molecular content - they need a real solution. It needs to be simple, it needs to be robust, it needs to be easy to apply retroactively, and it needs to be ready today.&lt;/p&gt;

&lt;h4&gt;Results&lt;/h4&gt;

&lt;p&gt;After about two days, Google had indexed &lt;a href="http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic"&gt;the article&lt;/a&gt; containing the hidden InChI for 1-bromonaphthalene. Using InchIMatic, I &lt;a href="http://www.google.com/search?q=%22InChI%3D1%2FC10H7Br%2Fc11-10-7-3-5-8-4-1-2-6-9%288%2910%2Fh1-7H%22"&gt;searched Google&lt;/a&gt; for the InChI, but only found the same &lt;a href="http://nmrshiftdb.org"&gt;NMRShiftDB&lt;/a&gt; item returned in previous queries.&lt;/p&gt;

&lt;p&gt;A few days later, a new Depth-First link appeared in Google. It pointed to the main XML Atom feed for Depth-First. This is a step in the right direction, but a far cry from the solution chemists need.&lt;/p&gt;

&lt;p&gt;None of the other major search engines supported by InChIMatic returned a link to the Depth-First article containing the hidden InChI. The only new result was retrieved by &lt;a href="http://search.com"&gt;Search.com&lt;/a&gt;. Like Google's result, this new link pointed to Depth-First's main XML feed.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Google doesn't index the contents of the &lt;tt&gt;title&lt;/tt&gt; attribute and may never do so. This should not be surprising. Google has made a fortune in part by staying &lt;a href="http://depth-first.com/articles/2007/02/28/inchi-spam"&gt;one step ahead of Search Engine Optimization (SEO) tricksters&lt;/a&gt;. By ignoring the contents of the &lt;tt&gt;title&lt;/tt&gt; attribute, Google and other search engines eliminate a real threat that could corrupt the search results that drive their business.&lt;/p&gt;

&lt;p&gt;What about other methods for concealing InChIs? One study suggests that none of them will work, either. &lt;a href="http://www.youcansleepwhenyouredead.com/archives/2004/12/testing_search_1.html"&gt;A two-year old experiment&lt;/a&gt; on SEO techniques compared ten different methods to conceal a text string from human viewers. Methods ranged from applying the &lt;tt&gt;display:none&lt;/tt&gt; attribute, to using matched font and background color, to concealing the text in a hidden frame. Although some of these methods may have initially been successful in getting content into Google, none of them work now.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://kinasepro.wordpress.com/"&gt;KinasePro&lt;/a&gt; recently described a &lt;a href="http://kinasepro.wordpress.com/2006/12/12/monday-night-ot-3/"&gt;failed attempt&lt;/a&gt; to get Google to index a SMILES string hidden in the &lt;tt&gt;alt&lt;/tt&gt; attribute of the &lt;tt&gt;img&lt;/tt&gt; element. Although &lt;a href="http://technorati.com"&gt;Technorati&lt;/a&gt; did index this content, a &lt;a href="http://www.technorati.com/search/InChI%3D1%2FC10H7Br%2Fc11-10-7-3-5-8-4-1-2-6-9%288%2910%2Fh1-7H"&gt;Technorati search&lt;/a&gt; for the 1-bromonaphthalene InChI returned no hits. &lt;a href="http://www.technorati.com/search/inchimatic"&gt;A Technorati search&lt;/a&gt; for the article containing the hidden InChI did work, suggesting that Technorati also ignores the &lt;tt&gt;title&lt;/tt&gt; attribute.&lt;/p&gt;

&lt;h4&gt;Why it Matters&lt;/h4&gt;

&lt;p&gt;Google and other search engines are in a perpetual state of war with SEO tricksters, and rightly so. At stake are search results that make up some of most valuable intellectual property in the world. Any attempt to make InChIs appear invisible to humans is likely to be interpreted by major search engines as spam and treated accordingly. It seems very unlikely that this stance will ever change, regardless of how legitimate the motivation might be.&lt;/p&gt;

&lt;p&gt;This leaves us with the fundamental problem of how to build a workable, Web-based chemical indexing system. The CAS registry system has served chemistry as the de facto standard for decades, but for a variety of reasons it is unworkable as an open technology for the Web. The more modern approach of combining InChI and standard search engines has major limitations, as outlined in this article.&lt;/p&gt;

&lt;p&gt;If anything in cheminformatics is &lt;a href="http://depth-first.com/articles/2007/02/14/whats-broken-in-cheminformatics"&gt;broken&lt;/a&gt;, it's the indexing and retrieval of molecular information on the Web. For those interested in solving a tough problem that really matters, this is a golden opportunity.&lt;/p&gt;</description>
      <pubDate>Mon, 05 Mar 2007 09:55:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:862d965a-1330-43b3-b5b9-6ff6f6924636</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/05/why-the-web-isnt-ready-for-chemistry</link>
      <category>Web</category>
      <category>inchi</category>
      <category>inchimatic</category>
      <category>broken</category>
      <category>web</category>
      <category>google</category>
      <category>invisible</category>
      <category>seo</category>
      <category>spam</category>
    </item>
    <item>
      <title>Googling for Molecules: New and Improved InChIMatic</title>
      <description>&lt;p&gt;&lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;, as &lt;a href="http://depth-first.com/articles/2007/02/19/google-for-molecules-with-inchimatic"&gt;described previously&lt;/a&gt;, is a new service that lets you perform exact structure searches on the Web using Google. A new version offers searching via several other search engines and features a streamlined interface. The screenshot below shows the the current search engine options with &lt;span title="InChI=1/C10H7Br/c11-10-7-3-5-8-4-1-2-6-9(8)10/h1-7H"&gt;1-bromonaphthalene&lt;/span&gt; in the editor window.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;a href="http://inchimatic.com"&gt;&lt;img src="http://depth-first.com/demo/20070228/screenshot.png" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;There are noticeable differences in the abilities of search engines other than Google to find InChIs. Google seems to offer the most complete coverage. For example, all search engines I've tried have returned either a subset or recapitulation of Google's results.&lt;/p&gt;

&lt;p&gt;One of the most striking things about InChIMatic is how specific the search results are. Every molecule that has produced results for me has been a direct hit. Also notable is how few InChIs are currently indexed by Google and other search engines. Tackling that problem will require a convenient and unobtrusive way to get InChIs into Web pages and to get those pages indexed by search engines. But more on that later.&lt;/p&gt;</description>
      <pubDate>Wed, 28 Feb 2007 09:59:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:9054b0cd-8626-497f-9243-f23ca7011406</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic</link>
      <category>Web</category>
      <category>inchimatic</category>
      <category>inchi</category>
      <category>google</category>
    </item>
    <item>
      <title>InChI Spam</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/cobalt/247564799/"&gt;&lt;img src="http://depth-first.com/demo/20070228/spam.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;Do you remember when getting email - any email - was exciting? For me, that time was 1995 and I had just found the Internet. Of course, I remember looking forward to messages from people I knew. But I also remember being blown away by the idea that I could write to anyone with an email account, anywhere in the world for essentially free - and that they could do the same. Back then, it was fun to get email, no matter what the source.&lt;/p&gt;

&lt;p&gt;Today, spam is something that I, like millions of others, deal with on a daily basis. And it's not limited to email. Anyone who runs a blog knows about comment spam and how difficult it can be to eradicate it. Even trackback is being used as a medium for blog spam. Of course, keyword Spam on the Web has been a constant problem for search engines - eliminating it has in part led to more than a few fortunes earned at companies like Google.&lt;/p&gt;

&lt;p&gt;Recently, I introduced a small Web application called &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;. It lets you conveniently do exact-structure molecular queries thorough popular search engines like Google. Draw your structure, click "Search" and find your matches.&lt;/p&gt;

&lt;p&gt;There aren't a lot of InChIs visible to search engines now, as an InChIMatic query for even the most trivial molecule will reveal. Regardless of you views on InChI as a technology for bringing chemistry to the Web, it seems very likely that the number of InChIs visible to search engines will increase significantly over the next few years. And with this increase may come sites dedicated to nothing other than publishing a lot of irrelevant InChIs in the hope of attracting accidental advertising click-throughs.&lt;/p&gt;

&lt;p&gt;Right now, searching the Web by InChIs offers a very high signal-to-noise ratio experience - not unlike email in 1995. The shysters haven't yet discovered it and nobody is counting on the technology for mission-critical work. But if and when the idea of indexing chemical content on the Web through InChIs begins to catch on, filtering tools will become essential. If this scenario seems implausible, think back to your first experience with email and how concerned you were about spam then.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Photo Credit: &lt;a href="http://flickr.com/photos/cobalt"&gt;cobalt123&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 28 Feb 2007 09:27:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:4eb0788d-7c07-4117-9392-3cd2a2b8d092</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/02/28/inchi-spam</link>
      <category>Web</category>
      <category>inchi</category>
      <category>inchimatic</category>
      <category>spam</category>
    </item>
    <item>
      <title>Anatomy of a Cheminformatics Web Application: Structure Cleanup in Java Molecular Editor</title>
      <description>&lt;p&gt;&lt;a href="http://rubyonrails.org"&gt;&lt;img src="http://depth-first.com/files/rails_logo.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;A very useful feature of many 2-D structure editors is a "clean" function that tidies up bond lengths and angles. &lt;a href="http://www.molinspiration.com/jme/"&gt;Java Molecular Editor&lt;/a&gt; (JME) is a lightweight 2-D editor that lacks this functionality. In this article, I'll describe a small Web application called "Cleanup" that adds a "clean" function to JME through Ajax and server-side programming, rather than directly extending JME itself. The technique described here differs somewhat from that described in a previous article on &lt;a href="http://depth-first.com/articles/2006/12/15/anatomy-of-a-cheminformatics-web-application-inchimatic"&gt;adding InChI support to JME with Ajax&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Cleanup in Action&lt;/h4&gt;

&lt;p&gt;Let's say Bob needs to draw the structure of the H&lt;sub&gt;1&lt;/sub&gt; antagonist &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2725"&gt;chlorpheniramine&lt;/a&gt; with JME. He mistakenly creates irregular bond angles at several points, but continues drawing anyway. His finished molecule looks like that shown below:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20061218/screenshot_1.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Rather than starting over to beautify his molecule, Bob, simply presses the &lt;strong&gt;Clean Molecule&lt;/strong&gt; button. This produces a structure with much more aesthetically-pleasing atom coordinates:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20061218/screenshot_2.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;If Bob needs to continue drawing at this point he can. In fact, he can press &lt;strong&gt;Clean Molecule&lt;/strong&gt; as many times as he wants to clean his structure at any time. Each time he presses the button, his structure is retained within the JME window.&lt;/p&gt;

&lt;h4&gt;Download and Prerequisites&lt;/h4&gt;

&lt;p&gt;Cleanup requires &lt;a href="http://rubyonrails.org"&gt;Ruby on Rails&lt;/a&gt; and &lt;a href="http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0"&gt;Ruby CDK&lt;/a&gt;. Both of these libraries can be installed using the &lt;a href="http://rubygems.org/"&gt;RubyGems&lt;/a&gt; packaging system.&lt;/p&gt;

&lt;p&gt;A recent article described the small amount of system configuration required for &lt;a href="http://depth-first.com/articles/2006/09/25/cdk-the-ruby-way-rcdk-0-2-0"&gt;Ruby CDK on Linux&lt;/a&gt;. Another article showed how to install &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;Ruby CDK on Windows&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;a href="http://rubyforge.org/frs/download.php/15687/jme-cleanup-0.0.1.tar.gz"&gt;complete Cleanup source package&lt;/a&gt; can be downloaded from RubyForge. For convenience, a copy of JME is included with the distribution. The author, Peter Ertl, has kindly given permission for the bundled JME applet to be used with Cleanup. For other uses, consult the &lt;a href="http://www.molinspiration.com/jme/"&gt;JME homepage&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Running Cleanup&lt;/h4&gt;

&lt;p&gt;After inflating the Cleanup archive, the following commands will start the server:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ cd jme-cleanup-0.0.1
$ ruby script/server
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;AMD64 Linux users will need to prepend a &lt;tt&gt;LD_PRELOAD&lt;/tt&gt; assignment to the &lt;tt&gt;script/server&lt;/tt&gt; invocation. On my system, which uses Sun's JDK, this looks like:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ cd jme-cleanup-0.0.1
$ LD_PRELOAD=/usr/java/jdk1.5.0_09/jre/lib/amd64/libzip.so ruby script/server
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;After starting the Cleanup server, pointing your browser to &lt;a href="http://localhost:3000/editor/cleanup"&gt;http://localhost:3000/editor/cleanup&lt;/a&gt; will run the application.&lt;/p&gt;

&lt;h4&gt;How It Works: A Web Application in Two Parts&lt;/h4&gt;

&lt;p&gt;Cleanup is a Web application consisting of two main parts - one written for a Web server, and one written for a Browser client. These two components work together to achieve an effect that, to a user, is indistinguishable from extending the JME applet with Java.&lt;/p&gt;

&lt;p&gt;The first component consists of small Rails application that accepts a Molfile as input and produces a Molfile with re-assigned coordinates as output. A Rails Action, &lt;tt&gt;clean_structure&lt;/tt&gt; accepts a Molfile encoded as form data and produces a response Molfile with re-assigned coordinates.&lt;/p&gt;

&lt;p&gt;The second component of the Cleanup application is written in JavaScript and executed from within the Browser. Let's take a look:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;&amp;lt;script language=&amp;quot;JavaScript&amp;quot;&amp;gt;

  /*
   * Returns the client-specific version of XMLHttpRequest
   */
  function createXHR()
  {
    var xhr;

    try
    {
      xhr = new ActiveXObject(&amp;quot;Msxml2.XMLHTTP&amp;quot;); // IE 5.0+
    }

    catch (e)
    {
      try
      {
        xhr = new ActiveXObject(&amp;quot;Microsoft.XMLHTTP&amp;quot;); // IE 5.0-
      }

      catch (E)
      {
        xhr = false;
      }
    }

    if (!xhr &amp;amp;&amp;amp; typeof XMLHttpRequest != 'undefined')
    {
      xhr = new XMLHttpRequest(); // every other browser
    }

    return xhr;  
  }

  function cleanStructure()
  {       
    var molfile = document.jme.molFile();
    var xhr = createXHR();

    xhr.open(&amp;quot;GET&amp;quot;, &amp;quot;clean_structure?molfile=&amp;quot; + encodeURIComponent(molfile));

    xhr.onreadystatechange=function()
    {
      if (xhr.readyState != 4) return;

      cleanMolfile = xhr.responseText;

      document.jme.readMolFile(cleanMolfile);
    }

    xhr.send(null);
  }
&amp;lt;/script&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="http://www.amazon.com/gp/product/0976694085?ie=UTF8&amp;amp;tag=depthfirst-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0976694085"&gt;&lt;img border="0" src="http://depth-first.com/files/pragmatic_ajax.jpg" align="right" &gt;&lt;/a&gt;&lt;img src="http://www.assoc-amazon.com/e/ir?t=depthfirst-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=0976694085" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /&gt;As you can see, the client side of Cleanup consists of two JavaScript functions, &lt;tt&gt;createXHR&lt;/tt&gt; and &lt;tt&gt;cleanStructure&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;The purpose of &lt;tt&gt;createXHR&lt;/tt&gt; is to return a valid instance of the central Ajax JavaScript object, &lt;tt&gt;XMLHttpRequest&lt;/tt&gt;. This function is a standard idiom in Ajax programming, and many JavaScript toolkits eliminate the need to write it explicitly. The function is included here mainly for the purpose of illustration. Microsoft browsers define two different flavors of &lt;tt&gt;XMLHttpRequest&lt;/tt&gt;, and both differ from the flavor supported by every other browser. To take this browser-specific behavior into account, a series of try/catch blocks are used.&lt;/p&gt;

&lt;p&gt;The second function, &lt;tt&gt;cleanStructure&lt;/tt&gt; does all of the JME-specific work. After obtaining an instance of &lt;tt&gt;XMLHttpRequest&lt;/tt&gt;, a HTTP GET request is built from JME's molfile. Of course, the magic of this request is that it is &lt;em&gt;asynchronous&lt;/em&gt;; it will not block the browser while it is being processed. When the request is complete, the cleaned Molfile is read by JME.&lt;/p&gt;

&lt;p&gt;Through the coordinated action of both of Cleanup's components, the application gives the appearance that JME has cleaned its own structure.&lt;/p&gt;

&lt;h4&gt;So What?&lt;/h4&gt;

&lt;p&gt;Well-designed, rich functionality makes software interesting and useful. At the same time, users demand software that loads and responds quickly. Using the technique presented here, it's possible to satisfy both of these contradictory requirements. Delegating key tasks to a server obviates the need to transmit large Java libraries to clients. Instead, small Java libraries can be transmitted, and several small asynchronous requests will be processed along the way.&lt;/p&gt;

&lt;p&gt;Viewed from this perspective, the capabilities of a good Java applet take on a very different character from what many have become accustomed to. In particular, extensibility and a robust, text-based communication protocol become much more important than built-in features.&lt;/p&gt;

&lt;p&gt;For example, we could provide a much more consistent user experience if the &lt;strong&gt;Clean Molfile&lt;/strong&gt; button were contained inside the JME editor itself, instead of on the Web page. In a more general sense, we'd like JME to offer the option of defining custom buttons that can be assigned arbitrary actions. Because Java/JavaScript integration is very well-supported, these custom actions could actually be written in JavaScript.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Java applets have been much maligned of late, partly due to the realization that in many situations they can be replaced with Ajax. However, well-designed, small, and extensible Java applets can play a key role in certain kinds of Ajax applications such as the one described here. Future articles in this series will explore some more of the many possibilities.&lt;/p&gt;</description>
      <pubDate>Mon, 18 Dec 2006 15:15:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:9e354945-8529-4ef0-98ad-8cba4e1dfe0d</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/12/18/anatomy-of-a-cheminformatics-web-application-structure-cleanup-in-java-molecular-editor</link>
      <category>Web</category>
      <category>ajax</category>
      <category>java</category>
      <category>applet</category>
      <category>rails</category>
      <category>ruby</category>
      <category>2d</category>
      <category>cleanup</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Anatomy of a Cheminformatics Web Application: InChIMatic</title>
      <description>&lt;p&gt;&lt;a href="http://rubyonrails.org"&gt;&lt;img src="http://depth-first.com/files/rails_logo.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://www.iupac.org/inchi/"&gt;InChI&lt;/a&gt; is an open molecular identifier system. Although InChIs obviate the need for a central registration authority, they are complex enough that they must be generated by computer. Currently, a few desktop molecular editors can generate InChI identifiers. But wouldn't it be more convenient if this capability existed in a simple Web application that could be used from any computer - anywhere? This article describes a Web application called "InChIMatic", which does just that.&lt;/p&gt;

&lt;p&gt;In this article, I'll show how &lt;a href="http://www.molinspiration.com/jme/"&gt;Java Molecular Editor&lt;/a&gt; (JME), a lightweight 2-D structure editor, can be extended to produce InChI identifiers through &lt;em&gt;server-side&lt;/em&gt; software written in Ruby, rather than by extending the applet with Java code.&lt;/p&gt;

&lt;h4&gt;Downloads and Prerequisites&lt;/h4&gt;

&lt;p&gt;InChIMatic requires &lt;a href="http://rubyonrails.org"&gt;Ruby on Rails&lt;/a&gt; and the &lt;a href="http://depth-first.com/articles/2006/08/17/ruby-and-inchi-the-rino-library"&gt;Rino InChI toolkit&lt;/a&gt;. Both of these libraries can be installed using the &lt;a href="http://rubygems.org/"&gt;RubyGems&lt;/a&gt; packaging system.&lt;/p&gt;

&lt;p&gt;The &lt;a href="http://rubyforge.org/frs/download.php/15616/inchimatic-0.0.2.tar.gz"&gt;complete InChIMatic source package&lt;/a&gt; can be downloaded from RubyForge. For convenience, a copy of JME is included with the distribution. The author, Peter Ertl, has kindly given permission for the bundled JME applet to be used with InChIMatic. For other uses, consult the &lt;a href="http://www.molinspiration.com/jme/"&gt;JME homepage&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Running InChIMatic&lt;/h4&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ cd inchimatic-0.0.2
$ ruby script/server
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Pointing your browser to &lt;a href="http://localhost:3000/inchi/input"&gt;http://localhost:3000/inchi/input&lt;/a&gt;, drawing a structure in the JME window, and pressing the "InChI!" button will produce the corresponding InChI in the area below.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20061214/screenshot_1.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h4&gt;Behind the Scenes&lt;/h4&gt;

&lt;p&gt;The JME applet itself provides no capabilities for generating InChI identifiers. This functionality is instead provided by the Rails server via the Rino InChI library.&lt;/p&gt;

&lt;p&gt;Let's say Susan wants to get the InChI for 3,4-dichlorophenol. After entering the structure into the JME window, she presses the "InChI!" button. This sets in motion the following sequence of events:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The JavaScript function &lt;tt&gt;writeMolfile()&lt;/tt&gt; is called. This retrieves a molfile representation of 3,4-dichlorophenol from JME, which is then written to to the hidden field &lt;tt&gt;molfile&lt;/tt&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A Rails listener notices that the hidden text field has been updated and so invokes the InChIMatic &lt;tt&gt;ajax_inchi&lt;/tt&gt; action. This is a Rails Ajax action that will update only a portion of the InChIMatic window. For more detail on this Rails Ajax technique, see &lt;a href="http://depth-first.com/articles/2006/12/04/anatomy-of-a-cheminformatics-web-application-ajaxifying-depict"&gt;the previous Anatomy of a Cheminformatics Web Application&lt;/a&gt; article.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;tt&gt;ajax_inchi&lt;/tt&gt; action retrieves the contents of the hidden &lt;tt&gt;molfile&lt;/tt&gt; field. This molfile is then used to generate an InChI using Rino. This InChI is then saved to the instance variable &lt;tt&gt;inchi&lt;/tt&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The contents of the InChIMatic area partitioned by the &lt;tt&gt;results&lt;/tt&gt; &lt;tt&gt;div&lt;/tt&gt; are then updated with the InChI obtained in Step 3. The JME applet itself is unaffected by this operation, allowing Susan to further elaborate her molecule, if she'd like.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;So What? Re-Thinking the Role of Applets&lt;/h4&gt;

&lt;p&gt;JME is, by itself, incapable of generating InChIs. Yet InChIMatic provides this capability as if it existed natively. In other words, a lightweight, fast-loading, and responsive 2-D editor can be extended &lt;em&gt;on the server side&lt;/em&gt;, rather than on the client side. The difference is imperceptible to the user, but ripe with potential for the developer.&lt;/p&gt;

&lt;p&gt;One of the most common, and completely valid, complaints about Java applets is that they take too long to load. Offloading some of the functionality currently being bundled in applets onto a Web server offers one way to combat the problem. Furthermore, combining Java applets with Ajax and powerful Web application frameworks like Ruby on Rails offers virtually limitless opportunities to re-think the role of Java applets in Web application development.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;JME's strength comes, perhaps ironically, from its limited functionality. By using some simple Web programming techniques, JME can be extended with server-side programming. The advantages, compared to extending the JME applet itself with Java on the client side, are significant. Future articles in this series will explore some of the possibilities.&lt;/p&gt;</description>
      <pubDate>Fri, 15 Dec 2006 15:49:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:4ff346b0-7cec-4b8e-9770-feccfe683823</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/12/15/anatomy-of-a-cheminformatics-web-application-inchimatic</link>
      <category>Web</category>
      <category>jme</category>
      <category>inchimatic</category>
      <category>inchi</category>
      <category>rino</category>
      <category>java</category>
      <category>rails</category>
      <category>ruby</category>
    </item>
    <item>
      <title>Hacking Molbank: Creating a Graphical Table of Contents</title>
      <description>&lt;p&gt;&lt;a href="http://www.mdpi.org/"&gt;&lt;img src="http://depth-first.com/files/mdpi-small.gif" border="0" align="right"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://www.mdpi.org/"&gt;Molbank&lt;/a&gt; is an Open Access collection of single-compound articles on synthetic chemistry. Previous articles on Depth-First have highlighted Molbank's practice of including &lt;a href="http://depth-first.com/articles/2006/11/30/molbank-and-the-convergence-of-open-access-open-data-and-open-source-in-chemistry"&gt;machine-readable molecular representations of its content&lt;/a&gt;, and its very &lt;a href="http://depth-first.com/articles/2006/12/01/hacking-molbank-downloading-a-complete-chemistry-journal"&gt;liberal policy on mirroring and robots&lt;/a&gt;. In this article, we'll take advantage of both of these features to build something that was left out of Molbank: a graphical table of contents.&lt;/p&gt;

&lt;h4&gt;The Graphical Table of Contents (GTOC)&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://depth-first.com/demo/20061211/molbank/index.html"&gt;The Molbank Graphical Table of Contents&lt;/a&gt; (Molbank GTOC) is available online. It consists of a single Web page containing a grid of color 2-D chemical structures representing the contents of Molbank. Each structure is hyperlinked into the Molbank site itself. Clicking on the structure takes you to the complete synthetic procedure and characterization data.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;a href="http://depth-first.com/demo/20061211/molbank/index.html"&gt;&lt;img src="http://depth-first.com/demo/20061211/screenshot_1.png" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h4&gt;Prerequisites, Downloading, and Running&lt;/h4&gt;

&lt;p&gt;To run this project, you'll need &lt;a href="http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0"&gt;Ruby CDK&lt;/a&gt;. A recent article described the small amount of system configuration required for &lt;a href="http://depth-first.com/articles/2006/09/25/cdk-the-ruby-way-rcdk-0-2-0"&gt;Ruby CDK on Linux&lt;/a&gt;. Another article showed how to install &lt;a href="http://depth-first.com/articles/2006/10/12/running-ruby-java-bridge-on-windows"&gt;Ruby CDK on Windows&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The complete source code for this project can be &lt;a href="http://rubyforge.org/frs/download.php/15500/molbank-0.0.1.tar.gz"&gt;downloaded from RubyForge&lt;/a&gt;. A subdirectory called &lt;strong&gt;demo&lt;/strong&gt; contains the pre-built final result.&lt;/p&gt;

&lt;p&gt;After unpacking the &lt;strong&gt;molbank-0.1.0&lt;/strong&gt; archive, the demo application can be run:&lt;/p&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ cd molbank-0.0.1
$ ruby test.rb
&lt;/pre&gt;
&lt;/div&gt;

&lt;h4&gt;Problems, We've Got Problems&lt;/h4&gt;

&lt;p&gt;Several problems were uncovered while building the Molbank GTOC. This is to be expected with any data produced "in the wild" rather than within the safety of an Ivory Tower. Here are the main categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Blank Images&lt;/strong&gt; The entry for M52 is blank. Checking the &lt;a href="http://www.mdpi.net/molbank/m0052.mol"&gt;underlying molfile&lt;/a&gt; reveals four instances of bond stereo flags set to "6," a problem common to many of the blank images in the GTOC. According to the Molfile specification, a value of 6 indicates "Down, double bonds," whatever that means. Given that the &lt;a href="http://www.mdpi.net/molbank/m0052.htm"&gt;molecules shown in M52&lt;/a&gt; only have one possible stereo bond, and that the Molfile specification relies on 2-D coordinates to encode double-bond geometry, an encoding inconsistency or incorrect stereo interpretation may be the cause.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Images Containing an "R" Atom Label&lt;/strong&gt; Entry M53 shows an "R" group at what should be the carbonyl carbon. &lt;a href="http://www.mdpi.net/molbank/m0053.mol"&gt;The underlying molfile&lt;/a&gt; contains several less-common entries in the properties block, a common feature of images containing "R" in the GTOC.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Molfile not Found&lt;/strong&gt; Entry M95 has no associated Molfile because it simply reports errata for other articles. M253-M259, on the other hand, lack molfiles because the articles were "withdrawn before publication." M347 describes a cyclodextrin for which, understandably, no molfile was provided. There are also a couple of cases in which a link to a molfile is provided, but is not available, such as M352.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Broken Molfiles&lt;/strong&gt; &lt;a href="http://www.mdpi.net/molbank/m0162.mol"&gt;The Molfile for M162&lt;/a&gt; encodes its line endings as two carriage returns and a newline, giving rise to the appearance of blank lines after data lines. This is something the Molfile specification strictly forbids. Apparently, the underlying CDK molfile reader can only handle one carriage return and a newline. Perhaps the extra return was introduced as the file was copied into and out of text editors on various operating systems in preparation for uploading it to Molbank. Another common problem was binary files being used for molfiles, such as with &lt;a href="http://www.mdpi.net/molbank/molbank2005/m402.mol"&gt;M402&lt;/a&gt;. These files don't appear to be compressed with either Zip or GZip and their nature is currently unknown.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bogus Molfiles&lt;/strong&gt; For reasons I still can't understand, &lt;a href="http://www.mdpi.net/molbank/molbank2005/m407.mol"&gt;the Molfile for M407&lt;/a&gt; encodes ethylene. So do several other Molbank molfiles. Other common dummy molfiles include toluene, benzene, and ethane.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After cataloging the problems that exist with the Molbank dataset and the software used to mine it, two interesting questions come into focus:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What can be done to help Molbank fix the most obvious problems in their molfiles and would they accept these improvements?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How can "real" datasets like Molbank help developers build better cheminformatics software? (a graphical Molfile Debugger Utility would come in handy...)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clearly, the connection between Open Access, Open Source, and Open Data is very strong and runs very deep.&lt;/p&gt;

&lt;h4&gt;Behind the Scenes&lt;/h4&gt;

&lt;p&gt;The Ruby Molbank GTOC generator works by connecting to the &lt;a href="http://www.mdpi.net"&gt;www.mdpi.net&lt;/a&gt; server to get its data in real-time. Internally, the software creates a map of the Molbank website so that the molfile (and URL) for any article can be retrieved on demand. Each readable molfile is used to create a 2-D image using &lt;a href="http://rubyforge.org/projects/rcdk"&gt;Ruby CDK&lt;/a&gt;. As a final step, the &lt;strong&gt;index.html&lt;/strong&gt; page is generated, linking the 2-D images to a specific URL for a Molbank article. This file is &lt;a href="http://depth-first.com/articles/2006/11/13/cheminformatics-for-the-web-convert-sd-files-to-html-with-ruby-cdk"&gt;produced with eRuby&lt;/a&gt; using a previously-described technique.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Building a Graphical Table of Contents for Molbank is not that difficult given the power of Ruby, and Molbank's forward-thinking attitude toward mirroring and robots. In working on this project, several problems were uncovered, both with Molbank's data, and the software used to mine it.&lt;/p&gt;

&lt;p&gt;In some ways, the software described here and its output are less interesting than the larger questions they raise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;How do scientific journals best serve not only their readers, but developers who want to provide new ways to use the journal?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How far does copyright extend in scientific publications? For example, are molfiles copyrightable? If so, at what level of detail are they not? If atom coordinates or some other kind of non-essential information is left out, does that change anything?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In what other practical ways could the connection between Open Source, Open Data, and Open Access be explored?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These and many related questions are waiting just around the corner. As Open Access becomes more viable, both &lt;a href="http://depth-first.com/articles/2006/10/19/disruptive-innovation-in-scientific-publishing-free-journal-management-systems"&gt;technically &lt;/a&gt; and &lt;a href="http://depth-first.com/articles/2006/10/26/more-open-access-in-the-sciences-metal-based-drugs-and-hindawi-publishing"&gt;commercially&lt;/a&gt;, look to Open Source and Open Data to provide the synergies that will unlock its true potential.&lt;/p&gt;</description>
      <pubDate>Mon, 11 Dec 2006 15:00:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:6c2f002b-3d8d-40fc-a4a5-8008c473e7d7</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/12/11/hacking-molbank-creating-a-graphical-table-of-contents</link>
      <category>Web</category>
      <category>molbank</category>
      <category>gtoc</category>
      <category>2d</category>
      <category>rcdk</category>
      <category>ruby</category>
      <category>mdpi</category>
      <category>opensource</category>
      <category>openaccess</category>
      <category>opendata</category>
    </item>
  </channel>
</rss>
