<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag inchimatic</title>
    <link>http://depth-first.com/articles/tag/inchimatic</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Building the Chemically-Aware Web: TotallySynthetic and InChIMatic</title>
      <description>&lt;p&gt;&lt;a href="http://depth-first.com/articles/tag/inchimatic"&gt;Recent D-F articles&lt;/a&gt; have discussed &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;, a Web application that lets you search the Web for chemical structures by simply drawing them. InChIMatic takes advantage of &lt;a href="http://www.iupac.org/inchi/"&gt;InChI&lt;/a&gt;, a system for representing molecular structures as a strings of text, and Google, which indexes these text strings. In this article, I'll show InChIMatic in action as it quickly finds a molecule discussed in a &lt;a href="http://totallysynthetic.com/blog/?p=762"&gt;review&lt;/a&gt; of &lt;a href="http://dx.doi.org/10.1021/ja074300t"&gt;Overman's Sarain A synthesis&lt;/a&gt; appearing in Paul Docherty's &lt;a href="http://totallysynthetic.com/blog"&gt;TotallySynthetic blog&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;You Can Skip this Step&lt;/h4&gt;

&lt;p&gt;The TotallySynthetic review lists three InChIs at the bottom, but which structures, out of the many discussed, do these represent? We need to know so that we can enter these structures into InChIMatic. This is, of course a step only needed because we're testing the system, not because we're using the system the way it was designed to be used.&lt;/p&gt;

&lt;p&gt;A recent D-F article discussed a method for &lt;a href="http://depth-first.com/articles/2007/09/06/from-inchi-to-image-with-ruby-open-babel-and-ruby-cdk"&gt;converting InChIs into 2D structures&lt;/a&gt; using Ruby. It has the advantage of being easily adaptable to building chemically-aware Web spiders. And it's 100% Open Source.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070924/first.png"&gt;&lt;/img&gt;&lt;img src="http://depth-first.com/demo/20070924/second.png"&gt;&lt;/img&gt;&lt;img src="http://depth-first.com/demo/20070924/third.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Running this library over TotallySynthetic's InChIs yields the three images above. Notice, we have some problems. The first and third images lack stereochemistry. The second has a trans- double bond instead of the cis- stereochemistry encoded by the InChI. There are good reasons for each of these problems, which I hope to address in later articles. For now, it's sufficient that we can clearly make the connection between the TotallySynthetic InChIs and structures in the Sarain A review.&lt;/p&gt;

&lt;h4&gt;Run the Search&lt;/h4&gt;

&lt;p&gt;We can test this system by pointing our browser to &lt;a href="http://inchimatic.com"&gt;inchimatic.com&lt;/a&gt;. Entering one of the structures and clicking "Search" takes us directly to a link for the TotallySynthetic site, courtesy of Google. Unfortunately, the link doesn't currently point to &lt;a href="http://totallysynthetic.com/blog/?p=762"&gt;the article itself&lt;/a&gt;. This issue may resolve itself as the Googlebot continues to index the TotallySynthetic site.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;a href="http://inchimatic.com"&gt;&lt;img src="http://depth-first.com/demo/20070924/inchimatic.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h4&gt;A Technical Note&lt;/h4&gt;

&lt;p&gt;If you spend any time working with InChIs, you'll notice that they're very long. So long, in fact, that they break many Web page layouts. There have been many attempts to &lt;a href="http://depth-first.com/articles/2007/03/05/why-the-web-isnt-ready-for-chemistry"&gt;fix the long-InChI problem&lt;/a&gt;, but Paul may have found the answer by trying the simplest thing that could possibly work.&lt;/p&gt;

&lt;p&gt;If you inspect the HTML source for the TotallySynthetic article, you'll find that Paul has inserted hard returns (&lt;tt&gt;br&lt;/tt&gt; elements) to manually break his InChIs, including &lt;del&gt;the one we just located with InChIMatic (first in the list)&lt;/del&gt; the first and last structures above, both of which can be found with InChIMatic:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;p&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;small&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;InChI=1/C29H33NO4Si/c1-5-32-28(31)26-25(34-27(30-26)22-15-9-6-10-16-22)21-33-35(29(2,3)4,23-17-11-7-12-18-23)24-19-13-8-14-20-24&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;br&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
/h6-20,25-26H,5,21H2,1-4H3/t25-,26-/m0/s1 InChI=1/C18H25NO6S/c1-14-9-11-15(12-10-14)26(22,23)19(17(21)25-18(2,3)4)13-7-6-8-16(20)24-5/h6,8-12H,7,13H2,1-5H3/b8-6- InChI=1/C47H58N2O10SSi/c1-10-56-43(51)47(36(32-41(50)55-9)30-31-49(44(52)59-45(3,4)5)60(53,54)37-28-26-34(2)27-29-37)40&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;br&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;

(58-42(48-47)35-20-14-11-15-21-35)33-57-61(46(6,7)8,38-22-16-12-17-23-38)39-24-18-13-19-25-39/h11-29,36,40H,10,30-33H2,1-9H3&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;br&lt;/span&gt; &lt;span class="punct"&gt;/&amp;gt;&lt;/span&gt;
/t36-,40-,47-/m0/s1&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;small&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;p&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In other words, fixing the long InChI/Google indexing problem may be as simple as just inserting &lt;tt&gt;br&lt;/tt&gt; elements when needed. More on this later, though.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;This article has shown a working demonstration that uses free tools to build self-organizing, highly distributed, searchable chemical databases. Although the system is far from perfect, it does provide a glimpse at what can be done right now with relatively little effort. Starting with this basic idea, we can begin to think about a variety of fast, free, user-friendly services that make finding molecules on the Web, and publishing their wherabouts, as easy as using Google and WordPress. But that's a story for another time.&lt;/p&gt;</description>
      <pubDate>Mon, 24 Sep 2007 10:54:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:1151053d-f13b-4431-b5a3-9eaaab658cec</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/09/24/building-the-chemically-aware-web-totallysynthetic-and-inchimatic</link>
      <category>Tools</category>
      <category>inchimatic</category>
      <category>inchi</category>
      <category>totallysynthetic</category>
      <category>sarain</category>
      <category>ruby</category>
    </item>
    <item>
      <title>Googling for Molecules with InChIMatic and Firefly</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;a href="http://depth-first.com/articles/tag/firefly"&gt;&lt;img src="http://depth-first.com/demo/20070815/screenshot.png"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;A &lt;a href="http://depth-first.com/articles/tag/inchimatic"&gt;series of D-F articles&lt;/a&gt; have discussed &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;, a Web application that lets you structure-search the Web using popular search engines such as Google. Recent articles have also described &lt;a href="http://depth-first.com/articles/tag/firefly"&gt;Firefly&lt;/a&gt;, a lightweight 2D structure editor designed especially for the Web.&lt;/p&gt;

&lt;p&gt;Today, the first alpha release of Firefly is available for use with &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Despite its small size of only 103K, the Firefly applet offers a number of advanced features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A clean interface with major functionality in plain sight.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Antialiased rendering.&lt;/strong&gt; Pressing the "+" and "-" keys will zoom in and out to reveal rendering detail.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;User-overridable bond length and angle constraints.&lt;/strong&gt; When dragging a bond, use &lt;em&gt;Shift&lt;/em&gt; to relax both angle and length constraints, or &lt;em&gt;Ctrl&lt;/em&gt; to relax only angle constraints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automatic inside-outside double bond rendering.&lt;/strong&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Built-in molfile import/export.&lt;/strong&gt; Use the File-&gt;Import Molfile and File-&gt;Export Molfile options to copy/paste a molfile from your system clipboard.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automatic implicit hydrogen detection.&lt;/strong&gt; The quadrant for hydrogen placement is chosen based on the bonds surrounding the atom.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Twenty levels of undo/redo.&lt;/strong&gt; The commands can either be issued from the menu, or &lt;em&gt;Ctrl-Z&lt;/em&gt;/&lt;em&gt;Ctrl-Y&lt;/em&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Persistent molecule.&lt;/strong&gt; When you visit another page and come back, Firefly remembers the molecule you were working on.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No digital certificate authorization.&lt;/strong&gt; Just start using it.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Firefly also incorporates a number of keyboard shortcuts to speed up structure drawing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;u&gt;1&lt;/u&gt;-&lt;u&gt;9&lt;/u&gt; keys&lt;/strong&gt; Builds a chain with the indicated number of carbons.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;u&gt;a&lt;/u&gt; key&lt;/strong&gt; Phenyl (&lt;u&gt;a&lt;/u&gt;romatic) ring. When hovering over a bond, fuses the ring to the bond. When hovering over an atom, fuses the ring to that atom, if possible, or sprouts the ring.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;u&gt;f&lt;/u&gt;, &lt;u&gt;l&lt;/u&gt;, &lt;u&gt;r&lt;/u&gt;, &lt;u&gt;i&lt;/u&gt; keys&lt;/strong&gt; The elements &lt;u&gt;F&lt;/u&gt;, C&lt;u&gt;l&lt;/u&gt;, B&lt;u&gt;r&lt;/u&gt;, and &lt;u&gt;I&lt;/u&gt;, respectively.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;u&gt;z&lt;/u&gt; and &lt;u&gt;t&lt;/u&gt; keys&lt;/strong&gt; The elements Si and Sn, respectively&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;u&gt;b&lt;/u&gt;, &lt;u&gt;c&lt;/u&gt;, &lt;u&gt;n&lt;/u&gt;, &lt;u&gt;o&lt;/u&gt;, &lt;u&gt;s&lt;/u&gt;, and &lt;u&gt;p&lt;/u&gt; keys&lt;/strong&gt; The elements B, C, N, O, S, and P, respectively.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;[delete] and [backspace] keys&lt;/strong&gt; deletes whatever is underneath the cursor.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To use these shortcuts, simply hover the cursor over an atom and press the key on your keyboard.&lt;/p&gt;

&lt;p&gt;Being an alpha release, this version of Firefly still has room for improvement. Your feedback is important. Please send questions, comments, and suggestions to the email address found under Firefly's "Help" menu.&lt;/p&gt;</description>
      <pubDate>Wed, 15 Aug 2007 09:16:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:c2afff29-47c5-4b35-acfa-72a812e66203</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/08/15/googling-for-molecules-with-inchimatic-and-firefly</link>
      <category>Tools</category>
      <category>firefly</category>
      <category>editor</category>
      <category>2d</category>
      <category>editordoc</category>
      <category>inchimatic</category>
    </item>
    <item>
      <title>Open Notebook Science Using InChIMatic</title>
      <description>&lt;p&gt;Have you ever wanted to find a molecule on the Web using your favorite search engine in combination with a 2-D structure editor? &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt; is a service that lets you do just that. In this article, I'll show how InChIMatic can be used to look up molecules in the &lt;a href="http://usefulchem-molecules.blogspot.com/"&gt;UsefulChem-Molecules&lt;/a&gt; blog.&lt;/p&gt;

&lt;p&gt;For those who aren't familiar with it, &lt;a href="http://usefulchem-molecules.blogspot.com/"&gt;UsefulChem-Molecules&lt;/a&gt; is a blog operated by &lt;a href="http://usefulchem.blogspot.com/"&gt;Jean-Claude Bradly's&lt;/a&gt; research group at Drexel University that publicly archives molecules of interest. Each entry is a single molecule that may be linked to other Web resources.&lt;/p&gt;

&lt;p&gt;Let's say you wanted to look up &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2202"&gt;dithranol&lt;/a&gt;. This can be done by simply pointing your browser to &lt;a href="http://inchimatic.com"&gt;inchimatic.com&lt;/a&gt; and drawing the structure:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070621/screenshot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;When you're finished, select your search engine of choice (we'll use Google here) and press "Search". You'll be taken to the familiar results page. The second result links to the UsefulChem-Molecules &lt;a href="http://usefulchem-molecules.blogspot.com/2007/04/uc0234.html"&gt;entry for dithranol&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070621/screenshot2.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;In performing this simple workflow, I noticed areas for improvement in both UsefulChem and InChIMatic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;UsefulChem&lt;/strong&gt; If you look at the &lt;a href="http://usefulchem-molecules.blogspot.com/2007/04/uc0234.html"&gt;entry for dithranol&lt;/a&gt;, you'll notice there are no linkouts. In essence, the entry is a bookmark without context. Although it's useful to know that the Bradly group is interested in this molecule, it would be even more interesting to know in what context. Each entry should contain at least one link giving the molecule a context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;InChIMatic&lt;/strong&gt; Using the back button on the Google results page takes you back to InChIMatic, but your molecule is gone. If you wanted to look for a series of related molecules, you couldn't edit your existing structure. As &lt;a href="http://depth-first.com/articles/2007/05/02/a-chemical-structure-editor-for-the-web-four-screenshots-of-a-firefly-prototype"&gt;Firefly 1.0&lt;/a&gt; nears completion, a top priority will be to incorporate it into InChIMatic and fix the back-button problem.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you can see, between InChIMatic and UsefulChem-Molecules, we have the makings of a crude laboratory information management system. The problem is we're trying to use existing tools (search engines and blogs) for purposes they are ill-suited for. It can work, but it could also work much better.&lt;/p&gt;

&lt;p&gt;What chemistry really needs is open, user-friendly systems specifically designed to archive and search chemical information of the type maintained by the Bradly group. But that's a story for another time.&lt;/p&gt;</description>
      <pubDate>Thu, 21 Jun 2007 10:27:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:3f9763cb-1e08-460d-b3fa-06e74cf235f6</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/06/21/open-notebook-science-using-inchimatic</link>
      <category>Meta</category>
      <category>inchimatic</category>
      <category>usefulchem</category>
      <category>blogs</category>
      <category>firefly</category>
    </item>
    <item>
      <title>Why the Web Isn't Ready for Chemistry</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070305/lavoisier.jpg" align="right"&gt;&lt;/img&gt;Wouldn't it be wonderful if chemical structure searching were as easy as using Google? Draw your molecule, press a button and get the good stuff first. That day may well arrive, but without the creation of some key technologies, the wait will be very long. This article describes an unsuccessful attempt to bring the chemically-aware Web closer to reality.&lt;/p&gt;

&lt;h4&gt;Background&lt;/h4&gt;

&lt;p&gt;Recently, I &lt;a href="http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic"&gt;introduced&lt;/a&gt; a small Web application called &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;. It lets you draw a structure and search for it though one of a number of popular search engines.&lt;/p&gt;

&lt;p&gt;InChIMatic turns a molecular query into text, which is then searched. This magic is made possible through the &lt;a href="http://en.wikipedia.org/wiki/International_Chemical_Identifier"&gt;IUPAC International Chemical Identifier&lt;/a&gt; (InChI). InChI has enormous potential for enabling chemical Web searches, but several barriers must be overcome first.&lt;/p&gt;

&lt;p&gt;For example, if you run even the most trivial of queries with &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;, you'll quickly see that search engines have only indexed a small number of InChIs. One reason is that InChIs are not yet widely-used by Web authors. But the deeper problem is that many pages containing InChIs are not indexed by search engines. For example, &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem's&lt;/a&gt; vast collection of InChIs is apparently invisible to Google.&lt;/p&gt;

&lt;p&gt;Compounding the problems of using InChIs to index chemical content on the Web is the lack of a standard, unobtrusive method for embedding the identifier into Web pages. Understandably, no author wants to invest valuable time and effort on an indexing system that doesn't work with their content and page layout. This problem is the subject of the current article.&lt;/p&gt;

&lt;h4&gt;Materials and Methods&lt;/h4&gt;

&lt;p&gt;The &lt;a href="http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic"&gt;InChIMatic article&lt;/a&gt; contained a test for how well Google and "invisible" InChIs might work together. If you mouse over the word "1-bromonaphthalene" in the first paragraph of that article, you'll see a small popup window containing the InChI. I accomplished this effect with the following HTML:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt; &lt;span class="attribute"&gt;title&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;InChI=1/C10H7Br/c11-10-7-3-5-8-4-1-2-6-9(8)10/h1-7H&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
  1-bromonaphthalene
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;My goal wasn't the popup effect. Instead, I wanted to test the &lt;tt&gt;title&lt;/tt&gt; attribute as an unobtrusive vector for getting InChIs indexed by Google. This excellent idea was &lt;a href="https://www2.blogger.com/comment.g?blogID=17889588&amp;amp;postID=9068626890097011632"&gt;a suggestion&lt;/a&gt; made by Oliver Koepler in response to &lt;a href="http://chem-bla-ics.blogspot.com/"&gt;Egon Willighagen's&lt;/a&gt; article on &lt;a href="http://chem-bla-ics.blogspot.com/2007/02/invisible-inchis.html"&gt;invisible InChIs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The idea is simple: InChIs are to be read by machines, not humans. InChIs consist of long strings of text that contain no widely-recognized wrappable characters. As a result, displaying InChIs in Web pages can break page layouts. Even if a wrapping mechanism is used, such as with the &lt;tt&gt;overflow&lt;/tt&gt; attribute, I find InChIs unpleasant to look at and just plain distracting. There's &lt;a href="http://depth-first.com/articles/2006/09/13/the-chemically-aware-web-are-we-there-yet"&gt;no good reason&lt;/a&gt; why any chemist should have to look at them.&lt;/p&gt;

&lt;p&gt;Chemists themselves are, understandably, &lt;a href="http://kinasepro.wordpress.com/2006/12/05/monday-night-ot-2/"&gt;reluctant&lt;/a&gt; to invest in ad hoc methods to index their molecular content - they need a real solution. It needs to be simple, it needs to be robust, it needs to be easy to apply retroactively, and it needs to be ready today.&lt;/p&gt;

&lt;h4&gt;Results&lt;/h4&gt;

&lt;p&gt;After about two days, Google had indexed &lt;a href="http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic"&gt;the article&lt;/a&gt; containing the hidden InChI for 1-bromonaphthalene. Using InchIMatic, I &lt;a href="http://www.google.com/search?q=%22InChI%3D1%2FC10H7Br%2Fc11-10-7-3-5-8-4-1-2-6-9%288%2910%2Fh1-7H%22"&gt;searched Google&lt;/a&gt; for the InChI, but only found the same &lt;a href="http://nmrshiftdb.org"&gt;NMRShiftDB&lt;/a&gt; item returned in previous queries.&lt;/p&gt;

&lt;p&gt;A few days later, a new Depth-First link appeared in Google. It pointed to the main XML Atom feed for Depth-First. This is a step in the right direction, but a far cry from the solution chemists need.&lt;/p&gt;

&lt;p&gt;None of the other major search engines supported by InChIMatic returned a link to the Depth-First article containing the hidden InChI. The only new result was retrieved by &lt;a href="http://search.com"&gt;Search.com&lt;/a&gt;. Like Google's result, this new link pointed to Depth-First's main XML feed.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Google doesn't index the contents of the &lt;tt&gt;title&lt;/tt&gt; attribute and may never do so. This should not be surprising. Google has made a fortune in part by staying &lt;a href="http://depth-first.com/articles/2007/02/28/inchi-spam"&gt;one step ahead of Search Engine Optimization (SEO) tricksters&lt;/a&gt;. By ignoring the contents of the &lt;tt&gt;title&lt;/tt&gt; attribute, Google and other search engines eliminate a real threat that could corrupt the search results that drive their business.&lt;/p&gt;

&lt;p&gt;What about other methods for concealing InChIs? One study suggests that none of them will work, either. &lt;a href="http://www.youcansleepwhenyouredead.com/archives/2004/12/testing_search_1.html"&gt;A two-year old experiment&lt;/a&gt; on SEO techniques compared ten different methods to conceal a text string from human viewers. Methods ranged from applying the &lt;tt&gt;display:none&lt;/tt&gt; attribute, to using matched font and background color, to concealing the text in a hidden frame. Although some of these methods may have initially been successful in getting content into Google, none of them work now.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://kinasepro.wordpress.com/"&gt;KinasePro&lt;/a&gt; recently described a &lt;a href="http://kinasepro.wordpress.com/2006/12/12/monday-night-ot-3/"&gt;failed attempt&lt;/a&gt; to get Google to index a SMILES string hidden in the &lt;tt&gt;alt&lt;/tt&gt; attribute of the &lt;tt&gt;img&lt;/tt&gt; element. Although &lt;a href="http://technorati.com"&gt;Technorati&lt;/a&gt; did index this content, a &lt;a href="http://www.technorati.com/search/InChI%3D1%2FC10H7Br%2Fc11-10-7-3-5-8-4-1-2-6-9%288%2910%2Fh1-7H"&gt;Technorati search&lt;/a&gt; for the 1-bromonaphthalene InChI returned no hits. &lt;a href="http://www.technorati.com/search/inchimatic"&gt;A Technorati search&lt;/a&gt; for the article containing the hidden InChI did work, suggesting that Technorati also ignores the &lt;tt&gt;title&lt;/tt&gt; attribute.&lt;/p&gt;

&lt;h4&gt;Why it Matters&lt;/h4&gt;

&lt;p&gt;Google and other search engines are in a perpetual state of war with SEO tricksters, and rightly so. At stake are search results that make up some of most valuable intellectual property in the world. Any attempt to make InChIs appear invisible to humans is likely to be interpreted by major search engines as spam and treated accordingly. It seems very unlikely that this stance will ever change, regardless of how legitimate the motivation might be.&lt;/p&gt;

&lt;p&gt;This leaves us with the fundamental problem of how to build a workable, Web-based chemical indexing system. The CAS registry system has served chemistry as the de facto standard for decades, but for a variety of reasons it is unworkable as an open technology for the Web. The more modern approach of combining InChI and standard search engines has major limitations, as outlined in this article.&lt;/p&gt;

&lt;p&gt;If anything in cheminformatics is &lt;a href="http://depth-first.com/articles/2007/02/14/whats-broken-in-cheminformatics"&gt;broken&lt;/a&gt;, it's the indexing and retrieval of molecular information on the Web. For those interested in solving a tough problem that really matters, this is a golden opportunity.&lt;/p&gt;</description>
      <pubDate>Mon, 05 Mar 2007 09:55:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:862d965a-1330-43b3-b5b9-6ff6f6924636</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/05/why-the-web-isnt-ready-for-chemistry</link>
      <category>Web</category>
      <category>inchi</category>
      <category>inchimatic</category>
      <category>broken</category>
      <category>web</category>
      <category>google</category>
      <category>invisible</category>
      <category>seo</category>
      <category>spam</category>
    </item>
    <item>
      <title>Googling for Molecules: New and Improved InChIMatic</title>
      <description>&lt;p&gt;&lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;, as &lt;a href="http://depth-first.com/articles/2007/02/19/google-for-molecules-with-inchimatic"&gt;described previously&lt;/a&gt;, is a new service that lets you perform exact structure searches on the Web using Google. A new version offers searching via several other search engines and features a streamlined interface. The screenshot below shows the the current search engine options with &lt;span title="InChI=1/C10H7Br/c11-10-7-3-5-8-4-1-2-6-9(8)10/h1-7H"&gt;1-bromonaphthalene&lt;/span&gt; in the editor window.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;a href="http://inchimatic.com"&gt;&lt;img src="http://depth-first.com/demo/20070228/screenshot.png" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;There are noticeable differences in the abilities of search engines other than Google to find InChIs. Google seems to offer the most complete coverage. For example, all search engines I've tried have returned either a subset or recapitulation of Google's results.&lt;/p&gt;

&lt;p&gt;One of the most striking things about InChIMatic is how specific the search results are. Every molecule that has produced results for me has been a direct hit. Also notable is how few InChIs are currently indexed by Google and other search engines. Tackling that problem will require a convenient and unobtrusive way to get InChIs into Web pages and to get those pages indexed by search engines. But more on that later.&lt;/p&gt;</description>
      <pubDate>Wed, 28 Feb 2007 09:59:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:9054b0cd-8626-497f-9243-f23ca7011406</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic</link>
      <category>Web</category>
      <category>inchimatic</category>
      <category>inchi</category>
      <category>google</category>
    </item>
    <item>
      <title>InChI Spam</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/cobalt/247564799/"&gt;&lt;img src="http://depth-first.com/demo/20070228/spam.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;Do you remember when getting email - any email - was exciting? For me, that time was 1995 and I had just found the Internet. Of course, I remember looking forward to messages from people I knew. But I also remember being blown away by the idea that I could write to anyone with an email account, anywhere in the world for essentially free - and that they could do the same. Back then, it was fun to get email, no matter what the source.&lt;/p&gt;

&lt;p&gt;Today, spam is something that I, like millions of others, deal with on a daily basis. And it's not limited to email. Anyone who runs a blog knows about comment spam and how difficult it can be to eradicate it. Even trackback is being used as a medium for blog spam. Of course, keyword Spam on the Web has been a constant problem for search engines - eliminating it has in part led to more than a few fortunes earned at companies like Google.&lt;/p&gt;

&lt;p&gt;Recently, I introduced a small Web application called &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;. It lets you conveniently do exact-structure molecular queries thorough popular search engines like Google. Draw your structure, click "Search" and find your matches.&lt;/p&gt;

&lt;p&gt;There aren't a lot of InChIs visible to search engines now, as an InChIMatic query for even the most trivial molecule will reveal. Regardless of you views on InChI as a technology for bringing chemistry to the Web, it seems very likely that the number of InChIs visible to search engines will increase significantly over the next few years. And with this increase may come sites dedicated to nothing other than publishing a lot of irrelevant InChIs in the hope of attracting accidental advertising click-throughs.&lt;/p&gt;

&lt;p&gt;Right now, searching the Web by InChIs offers a very high signal-to-noise ratio experience - not unlike email in 1995. The shysters haven't yet discovered it and nobody is counting on the technology for mission-critical work. But if and when the idea of indexing chemical content on the Web through InChIs begins to catch on, filtering tools will become essential. If this scenario seems implausible, think back to your first experience with email and how concerned you were about spam then.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Photo Credit: &lt;a href="http://flickr.com/photos/cobalt"&gt;cobalt123&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 28 Feb 2007 09:27:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:4eb0788d-7c07-4117-9392-3cd2a2b8d092</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/02/28/inchi-spam</link>
      <category>Web</category>
      <category>inchi</category>
      <category>inchimatic</category>
      <category>spam</category>
    </item>
    <item>
      <title>Google for Molecules with InChIMatic</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;a href="http://inchimatic.com"&gt;&lt;img src="http://depth-first.com/demo/20070219/inchimatic_logo.png" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt; is a simple Web application that uses Google to perform exact structure searches on the Web. After drawing your structure in the editor window, click the "InChI!" button to get a link. This link takes you to a Google query that displays matches for your molecule. You'll need both Java and JavaScript enabled in your browser to use InChIMatic.&lt;/p&gt;

&lt;h4&gt;The Technical Details&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://iupac.org/dhtml_home.html"&gt;&lt;img src="http://depth-first.com/demo/20070126/iupac_logo.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;The technology at the heart of InChIMatic is the &lt;a href="http://www.iupac.org/inchi/"&gt;IUPAC International Chemical Identifier&lt;/a&gt; (InChI). An InChI is an alphanumeric string that uniquely identifies a molecular structure. By converting molecular structures to text, InChI makes it easy to use standard Internet tools to do exact structure searches.&lt;/p&gt;

&lt;p&gt;The earliest reference in the peer-reviewed literature to using Google for searching InChIs is contained in a &lt;a href="http://dx.doi.org/10.1039/b502828k"&gt;2005 paper&lt;/a&gt;. More recently, a service called &lt;a href="http://querychem.com"&gt;QueryChem&lt;/a&gt; has taken this idea one step further by using the &lt;a href="http://code.google.com/"&gt;Google API&lt;/a&gt; to perform substructure searches based on InChI.&lt;/p&gt;

&lt;p&gt;InChIMatic works differently. Unlike a raw Google search, InChIMatic builds a Google query link for you. Unlike QueryChem, InChIMatic doesn't use the Google API and so has none of its restrictions. This does result in a limitation: InChIMatic can only currently be used to for exact structure queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubyonrails.org"&gt;&lt;img src="http://depth-first.com/files/rails_logo.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;The InChIMatic Web application has been discussed in greater technical detail in a &lt;a href="http://depth-first.com/articles/2006/12/15/anatomy-of-a-cheminformatics-web-application-inchimatic"&gt;previous article&lt;/a&gt;. The rapid Web application development framework &lt;a href="http://rubyonrails.com"&gt;Ruby on Rails&lt;/a&gt; made building InChIMatic a snap. InChIMatic is served by the Ruby application container &lt;a href="http://depth-first.com/articles/2007/02/05/mongrel-and-rails-its-just-not-fair"&gt;Mongrel&lt;/a&gt;, which is hosted on a Linux server running Apache. &lt;a href="http://depth-first.com/articles/tag/rino"&gt;Rino&lt;/a&gt; provided the Ruby interface to the &lt;a href="http://www.iupac.org/inchi/"&gt;IUPAC/NIST InChI toolkit&lt;/a&gt;. The 2-D structure editor is &lt;a href="http://www.molinspiration.com/jme/"&gt;Java Molecular Editor&lt;/a&gt; (JME) by Peter Ertl, which is used with his kind permission.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.opensource.org/docs/definition.php"&gt;&lt;img src="http://www.opensource.org/trademarks/opensource/web/opensource-110x95.png" align="right" alt="Open Source (OSI) Logo" border="0" width="110" height="95"&gt;&lt;/img&gt;&lt;/a&gt;Aside from JME, all components of InChIMatic, from the operating system it runs on to the InChI system itself, are &lt;a href="http://opensource.org"&gt;Open Source&lt;/a&gt; software.&lt;/p&gt;

&lt;h4&gt;Using InChI to Raise the Visibility of Your Content&lt;/h4&gt;

&lt;p&gt;InChIMatic returns many Google results for common molecules. But less common, known molecules return no hits at all. Three factors are responsible: (1) Google doesn't index all InChIs on the Internet; (2) few content providers currently use InChI; and (3) there is no standard and convenient mechanism to embed InChIs into Web pages for indexing by Google.&lt;/p&gt;

&lt;p&gt;For these reasons, I consider InChI to be bleeding edge technology. Some will find it useful, most will not. Unfortunately, this state of affairs will persist until problems (1) and (3) are solved.&lt;/p&gt;

&lt;p&gt;Nevertheless, if you're technically adventurous, InChIMatic offers a relatively painless way to begin incorporating InChIs into your content and verifying that they get indexed. There's no software to download, install, or upgrade. Forget about operating system incompatibilities (hopefully!). Just point your Java-enabled browser to &lt;a href="http://inchimatic.com"&gt;inchimatic.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Although there's no standard method to encode InChIs in Web pages, some interesting ideas have been put forward. &lt;a href="http://chem-bla-ics.blogspot.com/"&gt;Egon Willighagen&lt;/a&gt; has proposed &lt;a href="http://chem-bla-ics.blogspot.com/2006/12/including-smiles-cml-and-inchi-in.html"&gt;a system&lt;/a&gt; based on &lt;a href="http://www.w3.org/TR/xhtml-rdfa-primer/"&gt;RDFa&lt;/a&gt;. Future iterations of InChIMatic may include support for generating scripts and/or markup for including InChIs into blogs and other online content.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;InChI is a complex new technology in need of easy-to-use tools. InChIMatic is one such tool that makes it possible to perform exact structure queries using Google.&lt;/p&gt;

&lt;p&gt;One of the exciting things about Web applications is how quickly they can evolve. If in trying out InChIMatic you find something you'd like changed or added, please feel free to &lt;a href="mailto:r_apodaca@users.sf.net"&gt;write me&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Mon, 19 Feb 2007 10:18:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:eb531bca-c3b0-4f2d-8053-4272baa8bbfb</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/02/19/google-for-molecules-with-inchimatic</link>
      <category>Tools</category>
      <category>inchimatic</category>
      <category>inchi</category>
      <category>google</category>
      <category>webapp</category>
      <category>opensource</category>
      <category>rails</category>
      <category>iupac</category>
    </item>
    <item>
      <title>Anatomy of a Cheminformatics Web Application: InChIMatic</title>
      <description>&lt;p&gt;&lt;a href="http://rubyonrails.org"&gt;&lt;img src="http://depth-first.com/files/rails_logo.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://www.iupac.org/inchi/"&gt;InChI&lt;/a&gt; is an open molecular identifier system. Although InChIs obviate the need for a central registration authority, they are complex enough that they must be generated by computer. Currently, a few desktop molecular editors can generate InChI identifiers. But wouldn't it be more convenient if this capability existed in a simple Web application that could be used from any computer - anywhere? This article describes a Web application called "InChIMatic", which does just that.&lt;/p&gt;

&lt;p&gt;In this article, I'll show how &lt;a href="http://www.molinspiration.com/jme/"&gt;Java Molecular Editor&lt;/a&gt; (JME), a lightweight 2-D structure editor, can be extended to produce InChI identifiers through &lt;em&gt;server-side&lt;/em&gt; software written in Ruby, rather than by extending the applet with Java code.&lt;/p&gt;

&lt;h4&gt;Downloads and Prerequisites&lt;/h4&gt;

&lt;p&gt;InChIMatic requires &lt;a href="http://rubyonrails.org"&gt;Ruby on Rails&lt;/a&gt; and the &lt;a href="http://depth-first.com/articles/2006/08/17/ruby-and-inchi-the-rino-library"&gt;Rino InChI toolkit&lt;/a&gt;. Both of these libraries can be installed using the &lt;a href="http://rubygems.org/"&gt;RubyGems&lt;/a&gt; packaging system.&lt;/p&gt;

&lt;p&gt;The &lt;a href="http://rubyforge.org/frs/download.php/15616/inchimatic-0.0.2.tar.gz"&gt;complete InChIMatic source package&lt;/a&gt; can be downloaded from RubyForge. For convenience, a copy of JME is included with the distribution. The author, Peter Ertl, has kindly given permission for the bundled JME applet to be used with InChIMatic. For other uses, consult the &lt;a href="http://www.molinspiration.com/jme/"&gt;JME homepage&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Running InChIMatic&lt;/h4&gt;

&lt;div class="console"&gt;
&lt;pre&gt;
$ cd inchimatic-0.0.2
$ ruby script/server
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Pointing your browser to &lt;a href="http://localhost:3000/inchi/input"&gt;http://localhost:3000/inchi/input&lt;/a&gt;, drawing a structure in the JME window, and pressing the "InChI!" button will produce the corresponding InChI in the area below.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20061214/screenshot_1.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h4&gt;Behind the Scenes&lt;/h4&gt;

&lt;p&gt;The JME applet itself provides no capabilities for generating InChI identifiers. This functionality is instead provided by the Rails server via the Rino InChI library.&lt;/p&gt;

&lt;p&gt;Let's say Susan wants to get the InChI for 3,4-dichlorophenol. After entering the structure into the JME window, she presses the "InChI!" button. This sets in motion the following sequence of events:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The JavaScript function &lt;tt&gt;writeMolfile()&lt;/tt&gt; is called. This retrieves a molfile representation of 3,4-dichlorophenol from JME, which is then written to to the hidden field &lt;tt&gt;molfile&lt;/tt&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A Rails listener notices that the hidden text field has been updated and so invokes the InChIMatic &lt;tt&gt;ajax_inchi&lt;/tt&gt; action. This is a Rails Ajax action that will update only a portion of the InChIMatic window. For more detail on this Rails Ajax technique, see &lt;a href="http://depth-first.com/articles/2006/12/04/anatomy-of-a-cheminformatics-web-application-ajaxifying-depict"&gt;the previous Anatomy of a Cheminformatics Web Application&lt;/a&gt; article.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;tt&gt;ajax_inchi&lt;/tt&gt; action retrieves the contents of the hidden &lt;tt&gt;molfile&lt;/tt&gt; field. This molfile is then used to generate an InChI using Rino. This InChI is then saved to the instance variable &lt;tt&gt;inchi&lt;/tt&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The contents of the InChIMatic area partitioned by the &lt;tt&gt;results&lt;/tt&gt; &lt;tt&gt;div&lt;/tt&gt; are then updated with the InChI obtained in Step 3. The JME applet itself is unaffected by this operation, allowing Susan to further elaborate her molecule, if she'd like.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;So What? Re-Thinking the Role of Applets&lt;/h4&gt;

&lt;p&gt;JME is, by itself, incapable of generating InChIs. Yet InChIMatic provides this capability as if it existed natively. In other words, a lightweight, fast-loading, and responsive 2-D editor can be extended &lt;em&gt;on the server side&lt;/em&gt;, rather than on the client side. The difference is imperceptible to the user, but ripe with potential for the developer.&lt;/p&gt;

&lt;p&gt;One of the most common, and completely valid, complaints about Java applets is that they take too long to load. Offloading some of the functionality currently being bundled in applets onto a Web server offers one way to combat the problem. Furthermore, combining Java applets with Ajax and powerful Web application frameworks like Ruby on Rails offers virtually limitless opportunities to re-think the role of Java applets in Web application development.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;JME's strength comes, perhaps ironically, from its limited functionality. By using some simple Web programming techniques, JME can be extended with server-side programming. The advantages, compared to extending the JME applet itself with Java on the client side, are significant. Future articles in this series will explore some of the possibilities.&lt;/p&gt;</description>
      <pubDate>Fri, 15 Dec 2006 15:49:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:4ff346b0-7cec-4b8e-9770-feccfe683823</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/12/15/anatomy-of-a-cheminformatics-web-application-inchimatic</link>
      <category>Web</category>
      <category>jme</category>
      <category>inchimatic</category>
      <category>inchi</category>
      <category>rino</category>
      <category>java</category>
      <category>rails</category>
      <category>ruby</category>
    </item>
  </channel>
</rss>
