<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag broken</title>
    <link>http://depth-first.com/articles/tag/broken</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Casual Saturdays: When Broken is a Way of Life</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20080223/broken.png" /&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Source: Unknown via &lt;a href="http://funpics.nextmail.ru/safetyawards2007.htm"&gt;funpics.nextmail.ru&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Sat, 23 Feb 2008 16:33:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:1e327e08-0c80-47e1-9feb-e28f4f9a235a</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2008/02/23/casual-saturdays-when-broken-is-a-way-of-life</link>
      <category>Meta</category>
      <category>casualsaturdays</category>
      <category>broken</category>
    </item>
    <item>
      <title>SMILES and Aromaticity: Broken?</title>
      <description>&lt;p&gt;&lt;a href="http://opensmiles.org"&gt;&lt;img src="http://depth-first.com/demo/20071114/osmi.png" align="right"&gt;&lt;/img&gt;&lt;/a&gt;Since its introduction in 1988, the Simplified Molecular Input Line Entry System (&lt;a href="http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html"&gt;SMILES&lt;/a&gt;) has become one of the most widely-used molecular encoding systems in cheminformatics. But all technologies, no matter how widely-used, can be improved, and SMILES is no exception. This article, the first in a series, discusses a particularly thorny problem in the SMILES language.&lt;/p&gt;

&lt;h4&gt;A Little About SMILES&lt;/h4&gt;

&lt;p&gt;From the beginning, SMILES was a creative response to the complexity of the then-dominant &lt;a href="http://depth-first.com/articles/2007/07/20/everything-old-is-new-again-wiswesser-line-notation-wln"&gt;Wiswesser Line Notation&lt;/a&gt;. This can be seen perhaps nowhere more clearly than in the introduction to Weininger's &lt;a href="http://dx.doi.org/10.1021/ci00057a005"&gt;seminal paper on SMILES&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;SMILES is a chemical notation language specifically designed for computer use by chemists. ... Among several approaches to computerized chemical notation, line notation is popular because it represents molecular structure by a linear string of symbols, similar to natural language. The Wiswesser Line Notation is the most widely used representative of this method. It meets the essential requirements for a deterministic chemical notation, but it is difficult to use because many rules must be followed to generate the correct notation of a complex structure. To overcome this and other difficulties, the SMILES system was designed to be truly computer interactive.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What started out as a way for humans to more easily encode molecular structures has since evolved into a way for computers to encode molecular structures. Several factors are responsible for this shift, the biggest being the emergence of the Graphical User Interface, and with it, the &lt;a href="http://depth-first.com/articles/2007/11/27/chemwriter-chemical-structures-and-the-web"&gt;chemical structure editor&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Today, few chemists know how to encode SMILES nor, understandably, do they want to.&lt;/p&gt;

&lt;p&gt;But rather than dying out, SMILES found a new niche. Computers in the late '80's were mere toys; storage space was measured in kilobytes, and bandwith was practically nonexistent. But with a few ASCII characters, the complete connection table of most organic molecules could be encoded by SMILES. Not only this, but the algorithms needed to encode and decode SMILES were easy to reduce to practice in software. Daylight's original implementation of SMILES was soon joined by many others.&lt;/p&gt;

&lt;p&gt;A de facto standard was born.&lt;/p&gt;

&lt;h4&gt;If It Ain't Broke, Don't Fix It&lt;/h4&gt;

&lt;p&gt;For the last twenty years, SMILES has been used with great success to encode and store molecular structures. In an industry with few standards, SMILES is a rare example that shows what might be possible.&lt;/p&gt;

&lt;p&gt;If SMILES has been so successful, then what's &lt;a href="http://depth-first.com/articles/2007/02/14/whats-broken-in-cheminformatics"&gt;broken&lt;/a&gt; that needs fixing?&lt;/p&gt;

&lt;p&gt;Over the years, a growing list of missing, inconsistent, or confusing aspects of the SMILES language have come to light. One vendor of a SMILES implementation &lt;a href="http://www.eyesopen.com/docs/html/pyprog/DaylightSMILES.html"&gt;has even cataloged some of them&lt;/a&gt;. In most cases, the various implementers of SMILES systems have done the only thing they could do under the circumstances: apply their own judgment and best guesses.&lt;/p&gt;

&lt;p&gt;The result has been the gradual introduction of subtle incompatibilities among the SMILES implementations currently in use. This is the problem that the &lt;a href="http://opensmiles.org"&gt;OpenSMILES&lt;/a&gt; group aims to address.&lt;/p&gt;

&lt;p&gt;This status quo works in an environment of information silos, proprietary code, and closed data. But as cheminformatics moves in the direction of open data and interoperability, the problems become painfully apparent.&lt;/p&gt;

&lt;p&gt;Of all the topics that have been discussed so far by the OpenSMILES group, one stands out for its level of interest, number of contributors, strong opinions, and detailed discussion: lower-case atom symbols and aromaticity.&lt;/p&gt;

&lt;h4&gt;Aromaticity in SMILES&lt;/h4&gt;

&lt;p&gt;SMILES allows two kinds of atoms to be specified: upper-case and lower-case. Lower case atoms, according to existing documentation, signify 'aromatic' atoms.&lt;/p&gt;

&lt;p&gt;Weininger made clear that the reason for introducing lower case atom symbols was to facilitate canonicalization and substructure recognition. From &lt;a href="http://dx.doi.org/10.1021/ci00057a005"&gt;the original paper&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;Aromaticity must be detected in a system that generates an unambiguous chemical nomenclature. As will be discussed in following papers, this is needed both for the generation of a unique nomenclature and for effective substructure recognition. There can be no definition of 'aromaticity' that is both rigorous and all-encompassing: the word implies something about 'reactivity' to a synthetic chemist, 'ring current' to a NMR spectroscopist, 'symmetry' to a crystallographer, and presumably 'odor' to the original user of the word. Our objective in defining aromaticity is to provide an automatic and rigorous definition for the purposes of generating an unambiguous chemical nomenclature. Although the SMILES algorithm produces results that most chemists find natural, nothing is implied by this definition about physical properties.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kekule structures, in which double bonds and single bonds alternate, make it difficult for computers to implement certain kinds of algorithms. Defining lower case atom symbols to remove artificial asymmetry eliminated these problems.&lt;/p&gt;

&lt;p&gt;Weininger's original paper then goes on to describe the criteria for aromaticity in the SMILES language. At it's core, aromaticity boils down to the following defintion:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;... To qualify as aromatic, all atoms in the ring must be sp2 hybridized and the number of available 'excess' &amp;pi; electrons must satisfy H&amp;uuml;ckel's 4n+2 criterion. ...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071128/cb_cot.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Seems simple enough, but even in 1988 things were not so clear. For just a few sentences later, Weininger continues:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;... Entries of c1ccc1 and c1ccccccc1 will produce the correct &lt;strong&gt;antiaromatic&lt;/strong&gt; structures for cyclobutadiene and cyclooctatetraene, C1=CC=C1 and C1=CC=CC=CC=C1, respectively. ... [emphasis added]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;How are we to interpret this? Apparently, c1ccc1 and c1ccccccc1, neither of which obey the 4n+2 rule, are nevertheless &lt;em&gt;valid&lt;/em&gt; SMILES. We can even use &lt;a href="http://www.daylight.com/daycgi/depict"&gt;Daylight's Depict&lt;/a&gt; application to verify for ourselves that both c1ccc1 and c1ccccccc1 are read and depicted.&lt;/p&gt;

&lt;p&gt;Perhaps the concept of "antiaromaticity" (in contrast to "non-aromaticity") holds a special place in the SMILES language. If so, this distinction has never been clarified.&lt;/p&gt;

&lt;p&gt;While puzzling over the apparent contradiction, we later read that:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;... For example, quinone is nonaromatic, with only four excess electrons.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071128/quinone.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Weininger goes on to imply that the only correct way to represent quinone in SMILES is without lower case atom symbols, for example:&lt;/p&gt;

&lt;p&gt;O=C1CCC(=O)CC1&lt;/p&gt;

&lt;p&gt;And still later:&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;... For example, if one of the benzene ring's electrons is removed to form c1ccc[cH+]1, this ion is not aromatic because there are only five &amp;pi; electrons. ...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ambiguity makes it impossible to write standardized software: either 4n+2 is the rule for triggering the aromatic flag, and therefore lower case atom symbols, or it is not. If exceptions to this rule are needed, they must be specified in enough detail to be reduced to practice. To my knowledge, no documentation written in 1988 or since then has provided the necessary guidance.&lt;/p&gt;

&lt;p&gt;We can't have it both ways.&lt;/p&gt;

&lt;h4&gt;More Brokenness&lt;/h4&gt;

&lt;p&gt;Next, consider some of the examples left out of the original SMILES description. What about oligocyclic aromatics?&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071128/fluorenone.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10241"&gt;Fluorenone&lt;/a&gt;, according to the SMILES electron counting rules, has twelve &amp;pi; electrons and is therefore not aromatic. Strictly speaking, a SMILES like this:&lt;/p&gt;

&lt;p&gt;O=c2c1ccccc1c3ccccc23&lt;/p&gt;

&lt;p&gt;in which the carbonyl carbon is represented with a lower case atom symbol, should be considered invalid. Not just undesirable, but &lt;em&gt;verboten&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Yet Daylight's own Depict program, and other SMILES implementations, treat it as valid.&lt;/p&gt;

&lt;p&gt;Despite the lack of an aromatic tricyclic ring system, we may nevertheless want (or need) to represent fluorenone using lower case atom symbols. After all, canonicalization and substructure searches are very difficult otherwise.&lt;/p&gt;

&lt;p&gt;So any software we write needs to peel back layers of the tricyclic ring system in a quest for isolated aromatic rings. This exercise is clearly chemically meaningless as all atoms are coplanar and sp2 hybridized, and therefore interact. The counterargument is that the SMILES aromaticity model has no basis in reality - it's just a convention. So we press on.&lt;/p&gt;

&lt;p&gt;We eventually end up with a SMILES like this:&lt;/p&gt;

&lt;p&gt;O=C2c1ccccc1c3ccccc23&lt;/p&gt;

&lt;p&gt;The larger problem is making it clear when a reader or writer is and isn't allowed to perform this peeling back operation in search of aromaticity. Does the above SMILES match the SMILES definition of aromaticity or does it not? Are we allowed to peel back ring systems looking for imaginary 'embedded' aromatic ring systems or are we not?&lt;/p&gt;

&lt;p&gt;The answer may exist somewhere, just not in the documentation I have access to.&lt;/p&gt;

&lt;p&gt;The pragmatic approach, and the one taken by some implementations, is to simply ignore the whole question, forget about 4n+2, and call everything that 'looks' aromatic, like the fluorenone carbonyl carbon, 'aromatic.'&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071128/acenaphthalene.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;As another example, consider &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9161"&gt;acenaphthalene&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;c1cc2cccc3ccc(c1)c23&lt;/p&gt;

&lt;p&gt;Based on the published 4n+2 rules for SMILES aromaticity detection, acenaphthalene's twelve &amp;pi; electrons mean that it can't be represented in the aromatic form. It's not just discouraged - it's not allowed. Yet the Daylight Depict program, and a few other SMILES implementations, will accept this input as valid.&lt;/p&gt;

&lt;p&gt;The only way we can take advantage of the symmetrization afforded by lower case atom labels is to go hunting for isolated benzene rings. Upon doing so, we arrive at the following SMILES:&lt;/p&gt;

&lt;p&gt;c1cc2C=Cc3cccc(c1)c23&lt;/p&gt;

&lt;p&gt;Once again, we've more or less made an arbitrary distinction, assigning one set of carbons as aromatic and the other, fully coplanar, conjugated, and sp2-hybridized set as non-aromatic. Does the SMILES language allow us to do this? Again, the answer may exist somewhere, but not in any material I've been able to find.&lt;/p&gt;

&lt;p&gt;To put it simply, where in the SMILES documentation are we informed of which atoms in a coplanar, fully conjugated and sp2 hybridized ring system can be ignored from the 4n+2 test?&lt;/p&gt;

&lt;p&gt;For that matter, how do we know that oligocyclic aromatic ring systems are supported at all? Maybe only isolated five- and six-membered rings should be evaluated.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20071128/pyrrolopyridine.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Consider pyrrolopyridine (depicted above):&lt;/p&gt;

&lt;p&gt;c2ccn1cccc1c2&lt;/p&gt;

&lt;p&gt;Now let's assume that the SMILES 4n+2 rule can only be applied to individual rings, not ring systems. This prevents us from writing a SMILES like the one shown above because the left-hand pyridine ring has a formal &amp;pi; electron count of 7 - two from each endocyclic double bond, two from the nitrogen atom, and one from the exocyclic double bond.&lt;/p&gt;

&lt;p&gt;The best we could do is to write a SMILES like this:&lt;/p&gt;

&lt;p&gt;c2cc1C=CC=Cn1c2&lt;/p&gt;

&lt;p&gt;The only way we can create an 'aromatic' SMILES for the 4n+2 pyrrolopyridine ring system is to combine the electron counts for both rings.&lt;/p&gt;

&lt;p&gt;But Daylight's own Depict system, and I suspect many others, imply that the fully aromatic version of the pyrrolopyridine SMILES is valid.&lt;/p&gt;

&lt;p&gt;Once again, we can't have it both ways. If full ring systems need to be perceived and tested for 4n+2 &amp;pi; electrons, then consistency requires it also be done for acenaphthalene, fluorenone, and countless others for which space and time prevent discussion. If particular ring systems are exempt, then the SMILES language documentation should specify in detail how to tell the difference.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Given the problems in combining SMILES' symmetrization capability and lower-case atom symbols with the overloaded concept of aromaticity, one has to wonder - is it worth the trouble? Given the disregard for these rules by working third-party code, by Daylight, and by the original SMILES documentation, how reasonable is it to continue to use 4n+2 as the rule? What does the resulting confusion really buy?&lt;/p&gt;

&lt;p&gt;There is a simple way to resolve the issue, but you're probably not going to like it - at least not at first. But that's a story for another time.&lt;/p&gt;</description>
      <pubDate>Wed, 28 Nov 2007 09:43:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:46e56185-51ea-466b-b4bc-c9edfc28b489</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/11/28/smiles-and-aromaticity-broken</link>
      <category>Tools</category>
      <category>smiles</category>
      <category>opensmiles</category>
      <category>aromaticity</category>
      <category>broken</category>
    </item>
    <item>
      <title>Yet Another Free Chemistry Database: FooDB</title>
      <description>&lt;p&gt;&lt;a href="http://hmdb.med.ualberta.ca/foodb/"&gt;&lt;img src="http://depth-first.com/demo/20070306/foodb.png" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;a href="http://hmdb.med.ualberta.ca/foodb/"&gt;FooDB&lt;/a&gt; contains over 1,900 structures used as food additives in the United States. The data in FooDB are provided by the &lt;a href="http://vm.cfsan.fda.gov/~dms/eafus.html"&gt;FDA EAFUS site&lt;/a&gt;. You can search by CAS number, IUPAC name, or molecular formula. And in what is likely to be a trend to watch closely, FooDB is powered by the Web application framework &lt;a href="http://www.rubyonrails.org/"&gt;Ruby on Rails&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;FooDB is not alone: the number of &lt;a href="http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases"&gt;free chemistry databases on the Web&lt;/a&gt; just keeps growing. How this trend ultimately plays out is anybody's guess.&lt;/p&gt;

&lt;p&gt;One thing is clear: the point is rapidly approaching at which database aggregation technologies will start to matter. No chemist wants to search through over thirty databases to find the information they need on a molecule. They want it delivered in one quick, intuitive, user-friendly package. Here's another example of something that's &lt;a href="http://depth-first.com/articles/tag/broken"&gt;broken&lt;/a&gt; in cheminformatics. Like all broken things, it's the source of great frustration for users and great opportunity for developers.&lt;/p&gt;</description>
      <pubDate>Tue, 06 Mar 2007 11:09:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:e506dc14-5608-42b8-97aa-8f040a3f527c</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/06/yet-another-free-chemistry-database-foodb</link>
      <category>Databases</category>
      <category>broken</category>
      <category>foodb</category>
      <category>web</category>
      <category>fda</category>
      <category>rails</category>
      <category>ruby</category>
    </item>
    <item>
      <title>Why the Web Isn't Ready for Chemistry</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070305/lavoisier.jpg" align="right"&gt;&lt;/img&gt;Wouldn't it be wonderful if chemical structure searching were as easy as using Google? Draw your molecule, press a button and get the good stuff first. That day may well arrive, but without the creation of some key technologies, the wait will be very long. This article describes an unsuccessful attempt to bring the chemically-aware Web closer to reality.&lt;/p&gt;

&lt;h4&gt;Background&lt;/h4&gt;

&lt;p&gt;Recently, I &lt;a href="http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic"&gt;introduced&lt;/a&gt; a small Web application called &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;. It lets you draw a structure and search for it though one of a number of popular search engines.&lt;/p&gt;

&lt;p&gt;InChIMatic turns a molecular query into text, which is then searched. This magic is made possible through the &lt;a href="http://en.wikipedia.org/wiki/International_Chemical_Identifier"&gt;IUPAC International Chemical Identifier&lt;/a&gt; (InChI). InChI has enormous potential for enabling chemical Web searches, but several barriers must be overcome first.&lt;/p&gt;

&lt;p&gt;For example, if you run even the most trivial of queries with &lt;a href="http://inchimatic.com"&gt;InChIMatic&lt;/a&gt;, you'll quickly see that search engines have only indexed a small number of InChIs. One reason is that InChIs are not yet widely-used by Web authors. But the deeper problem is that many pages containing InChIs are not indexed by search engines. For example, &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem's&lt;/a&gt; vast collection of InChIs is apparently invisible to Google.&lt;/p&gt;

&lt;p&gt;Compounding the problems of using InChIs to index chemical content on the Web is the lack of a standard, unobtrusive method for embedding the identifier into Web pages. Understandably, no author wants to invest valuable time and effort on an indexing system that doesn't work with their content and page layout. This problem is the subject of the current article.&lt;/p&gt;

&lt;h4&gt;Materials and Methods&lt;/h4&gt;

&lt;p&gt;The &lt;a href="http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic"&gt;InChIMatic article&lt;/a&gt; contained a test for how well Google and "invisible" InChIs might work together. If you mouse over the word "1-bromonaphthalene" in the first paragraph of that article, you'll see a small popup window containing the InChI. I accomplished this effect with the following HTML:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_xml "&gt;&lt;span class="punct"&gt;&amp;lt;&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt; &lt;span class="attribute"&gt;title&lt;/span&gt;&lt;span class="punct"&gt;=&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;InChI=1/C10H7Br/c11-10-7-3-5-8-4-1-2-6-9(8)10/h1-7H&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&amp;gt;&lt;/span&gt;
  1-bromonaphthalene
&lt;span class="punct"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="tag"&gt;span&lt;/span&gt;&lt;span class="punct"&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;My goal wasn't the popup effect. Instead, I wanted to test the &lt;tt&gt;title&lt;/tt&gt; attribute as an unobtrusive vector for getting InChIs indexed by Google. This excellent idea was &lt;a href="https://www2.blogger.com/comment.g?blogID=17889588&amp;amp;postID=9068626890097011632"&gt;a suggestion&lt;/a&gt; made by Oliver Koepler in response to &lt;a href="http://chem-bla-ics.blogspot.com/"&gt;Egon Willighagen's&lt;/a&gt; article on &lt;a href="http://chem-bla-ics.blogspot.com/2007/02/invisible-inchis.html"&gt;invisible InChIs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The idea is simple: InChIs are to be read by machines, not humans. InChIs consist of long strings of text that contain no widely-recognized wrappable characters. As a result, displaying InChIs in Web pages can break page layouts. Even if a wrapping mechanism is used, such as with the &lt;tt&gt;overflow&lt;/tt&gt; attribute, I find InChIs unpleasant to look at and just plain distracting. There's &lt;a href="http://depth-first.com/articles/2006/09/13/the-chemically-aware-web-are-we-there-yet"&gt;no good reason&lt;/a&gt; why any chemist should have to look at them.&lt;/p&gt;

&lt;p&gt;Chemists themselves are, understandably, &lt;a href="http://kinasepro.wordpress.com/2006/12/05/monday-night-ot-2/"&gt;reluctant&lt;/a&gt; to invest in ad hoc methods to index their molecular content - they need a real solution. It needs to be simple, it needs to be robust, it needs to be easy to apply retroactively, and it needs to be ready today.&lt;/p&gt;

&lt;h4&gt;Results&lt;/h4&gt;

&lt;p&gt;After about two days, Google had indexed &lt;a href="http://depth-first.com/articles/2007/02/28/googling-for-molecules-new-and-improved-inchimatic"&gt;the article&lt;/a&gt; containing the hidden InChI for 1-bromonaphthalene. Using InchIMatic, I &lt;a href="http://www.google.com/search?q=%22InChI%3D1%2FC10H7Br%2Fc11-10-7-3-5-8-4-1-2-6-9%288%2910%2Fh1-7H%22"&gt;searched Google&lt;/a&gt; for the InChI, but only found the same &lt;a href="http://nmrshiftdb.org"&gt;NMRShiftDB&lt;/a&gt; item returned in previous queries.&lt;/p&gt;

&lt;p&gt;A few days later, a new Depth-First link appeared in Google. It pointed to the main XML Atom feed for Depth-First. This is a step in the right direction, but a far cry from the solution chemists need.&lt;/p&gt;

&lt;p&gt;None of the other major search engines supported by InChIMatic returned a link to the Depth-First article containing the hidden InChI. The only new result was retrieved by &lt;a href="http://search.com"&gt;Search.com&lt;/a&gt;. Like Google's result, this new link pointed to Depth-First's main XML feed.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;Google doesn't index the contents of the &lt;tt&gt;title&lt;/tt&gt; attribute and may never do so. This should not be surprising. Google has made a fortune in part by staying &lt;a href="http://depth-first.com/articles/2007/02/28/inchi-spam"&gt;one step ahead of Search Engine Optimization (SEO) tricksters&lt;/a&gt;. By ignoring the contents of the &lt;tt&gt;title&lt;/tt&gt; attribute, Google and other search engines eliminate a real threat that could corrupt the search results that drive their business.&lt;/p&gt;

&lt;p&gt;What about other methods for concealing InChIs? One study suggests that none of them will work, either. &lt;a href="http://www.youcansleepwhenyouredead.com/archives/2004/12/testing_search_1.html"&gt;A two-year old experiment&lt;/a&gt; on SEO techniques compared ten different methods to conceal a text string from human viewers. Methods ranged from applying the &lt;tt&gt;display:none&lt;/tt&gt; attribute, to using matched font and background color, to concealing the text in a hidden frame. Although some of these methods may have initially been successful in getting content into Google, none of them work now.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://kinasepro.wordpress.com/"&gt;KinasePro&lt;/a&gt; recently described a &lt;a href="http://kinasepro.wordpress.com/2006/12/12/monday-night-ot-3/"&gt;failed attempt&lt;/a&gt; to get Google to index a SMILES string hidden in the &lt;tt&gt;alt&lt;/tt&gt; attribute of the &lt;tt&gt;img&lt;/tt&gt; element. Although &lt;a href="http://technorati.com"&gt;Technorati&lt;/a&gt; did index this content, a &lt;a href="http://www.technorati.com/search/InChI%3D1%2FC10H7Br%2Fc11-10-7-3-5-8-4-1-2-6-9%288%2910%2Fh1-7H"&gt;Technorati search&lt;/a&gt; for the 1-bromonaphthalene InChI returned no hits. &lt;a href="http://www.technorati.com/search/inchimatic"&gt;A Technorati search&lt;/a&gt; for the article containing the hidden InChI did work, suggesting that Technorati also ignores the &lt;tt&gt;title&lt;/tt&gt; attribute.&lt;/p&gt;

&lt;h4&gt;Why it Matters&lt;/h4&gt;

&lt;p&gt;Google and other search engines are in a perpetual state of war with SEO tricksters, and rightly so. At stake are search results that make up some of most valuable intellectual property in the world. Any attempt to make InChIs appear invisible to humans is likely to be interpreted by major search engines as spam and treated accordingly. It seems very unlikely that this stance will ever change, regardless of how legitimate the motivation might be.&lt;/p&gt;

&lt;p&gt;This leaves us with the fundamental problem of how to build a workable, Web-based chemical indexing system. The CAS registry system has served chemistry as the de facto standard for decades, but for a variety of reasons it is unworkable as an open technology for the Web. The more modern approach of combining InChI and standard search engines has major limitations, as outlined in this article.&lt;/p&gt;

&lt;p&gt;If anything in cheminformatics is &lt;a href="http://depth-first.com/articles/2007/02/14/whats-broken-in-cheminformatics"&gt;broken&lt;/a&gt;, it's the indexing and retrieval of molecular information on the Web. For those interested in solving a tough problem that really matters, this is a golden opportunity.&lt;/p&gt;</description>
      <pubDate>Mon, 05 Mar 2007 09:55:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:862d965a-1330-43b3-b5b9-6ff6f6924636</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/05/why-the-web-isnt-ready-for-chemistry</link>
      <category>Web</category>
      <category>inchi</category>
      <category>inchimatic</category>
      <category>broken</category>
      <category>web</category>
      <category>google</category>
      <category>invisible</category>
      <category>seo</category>
      <category>spam</category>
    </item>
    <item>
      <title>Bryan Vickery on What's Broken in Cheminformatics</title>
      <description>&lt;blockquote&gt;
    &lt;p&gt;... The traditional model of publishing is sustainable, by which I mean profitable, because the academic/research community still funnels vast amounts of money into it from library budgets &#8211; it is certainly not self-sustaining. The fact that libraries still pay excessive charges to access this literature shows that the market is broken, not that the toll access route is sustainable.&lt;/p&gt;
    
    &lt;p&gt;-&lt;cite&gt;&lt;a href="http://blogs.openaccesscentral.com/ccblog/"&gt;Bryan Vickery&lt;/a&gt;, Editorial Director, &lt;a href="http://www.chemistrycentral.com/"&gt;Chemistry Central&lt;/a&gt; - quoted in &lt;a href="http://acscinf.org/docs/publications/Interviews/Vickery/2007/"&gt;Chemical Information Bulletin&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Bryan Vickery's interview is interesting on a number of levels, not the least of which being that it appears in an ACS publication. His comments raise the obvious question of &lt;em&gt;why&lt;/em&gt; does the academic/research community continue to support existing publishing models, complaints notwithstanding. The answer to this question is the key to fixing what's broken.&lt;/p&gt;</description>
      <pubDate>Thu, 01 Mar 2007 10:04:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:6a367fd4-f82c-4eed-b778-d5caa53bbb6d</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/01/bryan-vickery-on-whats-broken-in-cheminformatics</link>
      <category>Open X</category>
      <category>openaccess</category>
      <category>broken</category>
      <category>acs</category>
      <category>scientificpublishing</category>
    </item>
    <item>
      <title>What's Broken in Cheminformatics?</title>
      <description>&lt;p&gt;&lt;img src="http://depth-first.com/demo/20070214/soccer.png" align="right"&gt;&lt;/img&gt;A while back, &lt;a href="http://sethgodin.typepad.com/"&gt;Seth Godin&lt;/a&gt; gave &lt;a href="http://sethgodin.typepad.com/seths_blog/2006/08/this_is_broken_.html"&gt;a talk&lt;/a&gt; on things that are broken, why they are broken, and why they stay broken (&lt;a href="http://video.google.com/videoplay?docid=-4101280286098310645&amp;amp;hl=en"&gt;video&lt;/a&gt;). Seth is one of those rare speakers who can entertain and inform at the same time. I especially liked how his talk identifies seven (maybe more) reasons for brokenness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not My Job&lt;/li&gt;
&lt;li&gt;Selfish Jerks&lt;/li&gt;
&lt;li&gt;The World Changed&lt;/li&gt;
&lt;li&gt;I Didn't Know&lt;/li&gt;
&lt;li&gt;I'm not a Fish&lt;/li&gt;
&lt;li&gt;Contradictions&lt;/li&gt;
&lt;li&gt;Broken on Purpose&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As I watched Seth's talk, it wasn't hard to think of many examples in which chem(o|i)informatics is broken. Of course, the way you feel about something being broken depends on your perspective. It's far too easy to become cynical when dealing with the frustration of broken things. But for anyone wanting to do research that matters or build useful products, things that are broken are like manna from heaven.&lt;/p&gt;</description>
      <pubDate>Wed, 14 Feb 2007 10:17:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:3262ece9-2d45-4927-ac01-d2f9b4bec9b8</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/02/14/whats-broken-in-cheminformatics</link>
      <category>Meta</category>
      <category>broken</category>
      <category>sethgodin</category>
      <category>cheminformatics</category>
    </item>
  </channel>
</rss>
