<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Depth-First: Tag linenotation</title>
    <link>http://depth-first.com/articles/tag/linenotation</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Walking the Web of Chemical Informatics</description>
    <item>
      <title>Everything Old is New Again: Wiswesser Line Notation (WLN)</title>
      <description>&lt;p&gt;Sometimes, searching through the attic of scientific ideas turns up unexpected treasures. Like old clothing styles that suddenly become fashionable again, the passage of time has a way of making old ideas relevant by supplying new context. Those ideas that once enjoyed widespread popularity followed by complete obscurity are especially interesting. This article talks about one of them and why it may matter again.&lt;/p&gt;

&lt;h4&gt;Some History&lt;/h4&gt;

&lt;p&gt;Wiswesser Line-Formula Chemical Notation (WLN) was the most popular of perhaps a dozen actively-used line notations systems during the 1960s and 1970s. Developed by William J. Wiswesser over a period of many years starting in the 1940s, WLN contains a surprising number of modern ideas about chemistry and information. At one point a serious contender for the position now held by IUPAC nomenclature, WLN has become so obscure that few chemists have even heard of it and no modern software can manipulate it. Even finding information on the basic grammar of WLN is difficult: almost all of this documentation is contained in out-of-print books.&lt;/p&gt;

&lt;h4&gt;A Guide&lt;/h4&gt;

&lt;p&gt;To my surprise, WLN is both easy to understand and easy to use. As far as canonicalized line notations go, WLN is far easier to comprehend than either &lt;a href="http://depth-first.com/articles/tag/inchi"&gt;InChI&lt;/a&gt; or &lt;a href="http://depth-first.com/articles/2007/04/03/creating-canonical-smiles-with-ruby-open-babel"&gt;Canonical SMILES&lt;/a&gt;. Even more surprisingly, WLN actually meets more than a few of the requirements for the &lt;a href="http://depth-first.com/articles/2007/03/14/eleven-qualities-of-the-perfect-line-notation-for-the-web"&gt;ideal line notation for the Web&lt;/a&gt;. I was always struck by claims that high school graduates with little chemistry background could be trained to encode WLN in a few weeks; this now seems very plausible.&lt;/p&gt;

&lt;p&gt;My guide is Elbert Smith's short 1968 book &lt;em&gt;The Wiswesser Line-Formula Chemical Notation&lt;/em&gt;. I was able to pick up a used copy in excellent condition for under $30.00 from Amazon.&lt;/p&gt;

&lt;h4&gt;Some Examples&lt;/h4&gt;

&lt;p&gt;Functional groups, carbon chains, and rings play central roles in WLN. Unlike modern line notations that emphasize atoms, WLN is designed to mirror the way that chemists actually think about chemistry.&lt;/p&gt;

&lt;p&gt;Consider acetone:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070720/acetone.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;strong&gt;1V1&lt;/strong&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The two "1"s stand for saturated one-carbon chains, i.e. methyl groups. The "V" stands for a carbon doubly-bonded to oxygen.&lt;/p&gt;

&lt;p&gt;Given nothing more than the above example, the encoding of diethyl ether should be completely clear:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070720/ether.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;strong&gt;2O2&lt;/strong&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;"O" simply stands for a divalent oxygen atom.&lt;/p&gt;

&lt;p&gt;The benzene ring is one of the most ubiquitous functional groups in organic chemistry. Wiswesser knew this and wanted to make it easy to encode aromatic compounds. His solution is simplicity itself. Consider acetophenone:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070720/acetophenone.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;strong&gt;1VR&lt;/strong&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The "R" stands for a benzene ring. WLN canonicalization gives it the lowest priority and this is why it appears last.&lt;/p&gt;

&lt;p&gt;What about disubstituted aromatics? Consider 4-chloroacetophenone:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070720/4-chloroacetophenone.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;strong&gt;GR DV1&lt;/strong&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The "G" symbol stands for chlorine. The " DV1" stands for the 4-acyl substituent. Here, the "D" denotes the 4-postion. The 3- position would result in " CV1", and the 2- position would give " BV1". The space character means that the character following it should be interpreted as ring locant.&lt;/p&gt;

&lt;p&gt;WLN uses a very simple system of canonicalization based on alphanumeric order. Priority increases in the direction: (1) symbols; (2) numbers in numerical order; and (3) letters in alphabetical order (with the exception of R which has lower priority than symbols). Coding generally begins at the substituent assigned the highest priority. This explains why 4-chloroacetophenone is not coded as "1VR DG".&lt;/p&gt;

&lt;h4&gt;Advantages of WLN&lt;/h4&gt;

&lt;p&gt;WLN is remarkably compact, especially when compared to SMILES and InChI. For example, consider the InChI for 4-chloroacetophenone, which is eight times longer than the corresponding WLN:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_inchi "&gt;InChI=1/C8H7ClO/c1-6(10)7-2-4-8(9)5-3-7/h2-5H,1H3&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Additionally, it's readily apparent to a human observer when a WLN is not properly coded - after all, the language was designed to be both read and written by humans rather than machines. Anyone can look at "GR DV1" and deduce almost instantly that it contains a carbonyl group (V), a phenyl group (R), a chloro group (G), and a methyl group (1).&lt;/p&gt;

&lt;p&gt;And if this functional group recognition is easy for humans, it's orders of magnitude easier for machines. It's not difficult at all to imagine very sophisticated and fast molecular query systems that do nothing more than simple processing of the ASCII text contained within WLN strings.&lt;/p&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;It's very unlikely that WLN will ever be resurrected for the purpose of replacing existing line notations. On the other hand, WLN offers many potentially useful concepts for those creating new line notations. As they say, history doesn't repeat itself, but it frequently rhymes.&lt;/p&gt;</description>
      <pubDate>Fri, 20 Jul 2007 08:46:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:d729733e-ad5a-4895-b3e4-4ebd5b46740c</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/07/20/everything-old-is-new-again-wiswesser-line-notation-wln</link>
      <category>Tools</category>
      <category>wln</category>
      <category>smiles</category>
      <category>inchi</category>
      <category>linenotation</category>
    </item>
    <item>
      <title>My InChI Runneth Over</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;a href="http://depth-first.com/demo/20070517/screenshot.png"&gt;&lt;img src="http://depth-first.com/demo/20070517/thumbnail.png" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The only solution to this problem I've found is to set the CSS &lt;tt&gt;overflow&lt;/tt&gt; property to "scroll":&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_inchi "&gt;InChI=1/C50H70O14/c1-25(24-51)14-28-17-37(52)50(8)41(54-28)19-33-34(61-50)18-32-29(55-33)10-9-12-46(4)42(58-32)23-49(7)40(62-46)21-39-47(5,64-49)13-11-30-44(60-39)26(2)15-31-36(56-30)22-48(6)38(57-31)20-35-45(63-48)27(3)16-43(53)59-35/h9-10,16,24,26,28-42,44-45,52H,1,11-15,17-23H2,2-8H3/b10-9-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
      <pubDate>Thu, 17 May 2007 08:59:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:24f2ab82-db67-4fa2-ab7a-e17d94dc68fe</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/05/17/my-inchi-runneth-over</link>
      <category>Meta</category>
      <category>inchi</category>
      <category>overflow</category>
      <category>linenotation</category>
      <category>smiles</category>
      <category>html</category>
    </item>
    <item>
      <title>Strings and Things</title>
      <description>&lt;p&gt;&lt;a href="http://daylight.com/meetings/mug01/Bradshaw/History/800x600/Strings_and_Things/sld029.htm"&gt;&lt;img src="http://depth-first.com/demo/20070425/the_future.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;I ran across John Bradshaw's excellent presentation &lt;a href="http://daylight.com/meetings/mug01/Bradshaw/History/800x600/Strings_and_Things/sld001.htm"&gt;Strings and Things&lt;/a&gt;. Part historical overview, part explanation of the SMILES/SMARTS &lt;a href="http://depth-first.com/articles/2007/03/14/eleven-qualities-of-the-perfect-line-notation-for-the-web"&gt;line notation&lt;/a&gt; systems, Bradshaw's slides are chock full of interesting tidbits.&lt;/p&gt;

&lt;p&gt;My favorite: &lt;a href="http://daylight.com/meetings/mug01/Bradshaw/History/800x600/Strings_and_Things/sld029.htm"&gt;slide 29&lt;/a&gt; - "Line notations are dead." It's a wonderful illustration of why predicting the future of technology is so tricky. The light pen became the mouse, the computer display became color, and Digital fell off a cliff. SMILES and SMARTS are the only things to have survived.&lt;/p&gt;</description>
      <pubDate>Wed, 25 Apr 2007 09:28:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:00ee8bee-8341-48d4-bc91-afb685b90acb</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/04/25/strings-and-things</link>
      <category>Meta</category>
      <category>smiles</category>
      <category>smarts</category>
      <category>linenotation</category>
    </item>
    <item>
      <title>Rethinking the Command Line for Chemistry</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070327/yubnub.png" align="center"&gt;&lt;/img&gt;&lt;/center&gt;
&lt;br /&gt;&lt;br /&gt;
A &lt;a href="http://depth-first.com/articles/2007/03/15/do-you-use-the-command-line"&gt;recent article&lt;/a&gt; discussed the renaissance of the command line. Particularly on the Web, command line interfaces have become so advanced, that most of us don't even realize we're using them. Consider the Google search box, which is nothing more than one of the most powerful command line interfaces ever developed.&lt;/p&gt;

&lt;p&gt;A service called &lt;a href="http://yubnub.org/"&gt;YubNub&lt;/a&gt; takes this idea one step further. YubNub is a meta command line interface for the Web. The following YubNub command will do a &lt;a href="http://flickr.com"&gt;Flickr&lt;/a&gt; search for benzene.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070327/ducatisearch.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;If this were all YubNub did, it would be merely interesting. What makes YubNub remarkable is that you can create your own commands that other people can use. I recently added the "ginchi" command to query Google for an InChI. Now you can try it out:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/demo/20070327/benzenesearch1.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;By itself this isn't particularly useful because you can just go to Google and query the InChI directly. However, it's not too hard to imagine several commands like &lt;tt&gt;ginchi&lt;/tt&gt; that could be added. Some would use Google, others would use other services.  How about something that searches Mitch Garcia's &lt;a href="http://www.sciencebase.com/science-blog/chemical-pipe-works.html"&gt;chemistry journal Yahoo pipe&lt;/a&gt;? It would be very convenient to have all of those commands accessible from the same Web page.&lt;/p&gt;

&lt;p&gt;Command line interfaces can be phenomenally useful for both beginning and advanced users. The hardest part to get right is not what the user sees as they type, but what happens after they hit the enter key.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://depth-first.com/articles/tag/linenotation"&gt;Line notations&lt;/a&gt; are the perfect match for command line interfaces. The widespread use of SMILES and the precision of InChI offer many possibilities for innovative chemistry Web services.&lt;/p&gt;</description>
      <pubDate>Tue, 27 Mar 2007 12:30:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:8e30edda-82a2-4800-95cf-0c34b669a056</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/27/rethinking-the-command-line-for-chemistry</link>
      <category>Tools</category>
      <category>commandline</category>
      <category>linenotation</category>
      <category>yubnub</category>
      <category>web20</category>
      <category>ginchi</category>
    </item>
    <item>
      <title>Eleven Qualities of The Perfect Line Notation for the Web</title>
      <description>&lt;p&gt;&lt;a href="http://flickr.com/photos/wenwennie/396170719/"&gt;&lt;img src="http://depth-first.com/demo/20070314/line.jpg" align="right" border="0"&gt;&lt;/img&gt;&lt;/a&gt;If you had to design the perfect line notation for the Web, what would it look like? This is hardly an academic exercise given the central role played by line notations in information systems. For a variety of reasons, existing line notations may not be the right match for the Web. This article explores this question and outlines the main qualities needed by a Web-friendly line notation.&lt;/p&gt;

&lt;h4&gt;A Few Lines About Line Notations&lt;/h4&gt;

&lt;p&gt;A line notation is any system that converts a molecular structure into a single line of text. Chemists have been using line notations for over 140 years - long before the advent of computers. Because of their versatility, line notations are frequently used in situations they were not designed for. When this happens, limitations become apparent, resulting in renewed efforts to build a better system.&lt;/p&gt;

&lt;p&gt;As &lt;a href="http://depth-first.com/articles/2006/08/18/107-years-of-line-formula-notations-1861-1968"&gt;noted previously&lt;/a&gt;, the invention of new line notations is a field whose popularity ebbs and flows over time. Currently, the three most important line notations are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IUPAC Nomenclature&lt;/li&gt;
&lt;li&gt;Simplified Molecular Input Line Entry System (SMILES)&lt;/li&gt;
&lt;li&gt;IUPAC International Chemical Identifier (InChI)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these systems has its own unique characteristics. &lt;a href="http://www.acdlabs.com/iupac/nomenclature/"&gt;IUPAC nomenclature&lt;/a&gt; is the oldest and most widely-used line notation. It appears in numerous contexts, including Web pages, peer-reviewed journals, reports, patents, MSDS sheets, catalogs, and reagent bottles. By comparison, &lt;a href="http://www.daylight.com/smiles/index.html"&gt;SMILES&lt;/a&gt; is a distant second in popularity. It's main role has been to facilitate machine entry of structural information by humans, &lt;a href="http://www.emolecules.com/"&gt;like this&lt;/a&gt;. &lt;a href="http://en.wikipedia.org/wiki/International_Chemical_Identifier"&gt;InChI&lt;/a&gt; is the newest of the bunch. It serves both as a line notation and as a unique identifier requiring no central authority.&lt;/p&gt;

&lt;h4&gt;The Perfect Line Notation for the Web&lt;/h4&gt;

&lt;p&gt;The emergence of the Web as a standard information delivery platform has refocused the attention of many developers on the line notation problem. With this idea in mind, here are some guesses about the qualities of the ideal Web-friendly line notation.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Readily Encodable and Decodable by Humans.&lt;/strong&gt; There's something unnerving about a line notation that can't easily be deciphered by humans. Is this really the right string? Did I copy it completely? This problem surfaces with every line notation, but some fare better than others. IUPAC nomenclature, for example, is one of the first things taught in many beginning organic chemistry classes. It's complicated, but still understandable by non-experts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Readily Encodable and Decodable by Machines.&lt;/strong&gt; It may be relatively simple for humans to read and write IUPAC nomenclature, but not so for machines. Software that reads and writes SMILES, on the other hand, is by comparison easy to write. This explains the abundance of software packages that handle SMILES and the &lt;a href="http://depth-first.com/articles/tag/opsin"&gt;scarcity&lt;/a&gt; of those that handle IUPAC nomenclature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Uses URI-Safe Characters Only.&lt;/strong&gt; A &lt;a href="http://en.wikipedia.org/wiki/Uniform_Resource_Identifier"&gt;URI&lt;/a&gt; uniquely identifies every document on the Internet. Why can't a line notation be used in combination with a URI to uniquely identify every molecule? One reason is that every line notation currently in use contains &lt;a href="http://www.freesoft.org/CIE/RFC/1738/4.htm"&gt;characters unsafe for use in URIs&lt;/a&gt;. Any line notation designed for use on the Web needs to avoid these characters in its syntax. &lt;em&gt;Update: InChI doesn't use unsafe characters, but it does use the reserved characters "=", "?", and "/". These characters may therefore &lt;a href="http://info-uri.info/registry/OAIHandler?verb=GetRecord&amp;amp;metadataPrefix=reg&amp;amp;identifier=info:inchi/"&gt;need to be escaped&lt;/a&gt;, depending on the context.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Encodes All Molecules.&lt;/strong&gt; Buried within every line notation is an opinion on what chemistry is really about. To operate on the Web, these opinions need to be as closely aligned as possible with those of chemists themselves. &lt;a href="http://depth-first.com/articles/tag/flexmol"&gt;Several Depth-First articles&lt;/a&gt; have discussed the limitations of existing line notations as molecular languages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compact.&lt;/strong&gt; Nobody wants to look at or manipulate a line of text that's longer than it needs to be. Of course, the more expressive a line notation is, the more verbose it will be. In other words, qualities 4 and 5 will always be in conflict.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Canonicalizable.&lt;/strong&gt; A line notation supports canonicalization when it specifies rules that can be guaranteed to always generate the same line notation for a given molecule. This feature enables many labor-saving assumptions. For example, a canonical representation makes a great identifier in a database, reducing the cost of storing and retrieving structural information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Explicit Hydrogen Atom Encoding.&lt;/strong&gt; SMILES makes few requirements regarding hydrogen atom encoding. As a result, each software implementation is left to its own devices. The resulting confusion is the price paid for the convenience (Quality 1) of a compact notation (Quality 5).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hierarchical Structure.&lt;/strong&gt; One of InChI's innovations was the introduction of a hierarchical encoding system. This system, also referred to as InChI "layers", enables a molecule to be viewed at several levels of resolution: as a molecular formula; as a network of atoms; as a network of atoms containing hydrogen atoms; as an atomic network with stereochemistry; and so on. I'm unaware of any reports in which this feature has been exploited in a practical way, although they aren't difficult to imagine.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flat Structure.&lt;/strong&gt; By grouping structural features into layers (Quality 8), InChI introduces a lot of complexity that is absent in SMILES and even IUPAC nomenclature. This complexity, in part, makes it difficult for both humans and machines to properly encode InChIs (Qualities 1 and 2). Given this complexity, and the fact that the utility of hierarchical encoding has yet to be conclusively demonstrated, it may be better to avoid it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Open Source Software Implementation.&lt;/strong&gt; No encoding standard in today's world stands a chance of gaining acceptance without an open source reference implementation. InChI broke new ground in this area and should serve as a model for any system that follows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unencumbered by Patents.&lt;/strong&gt; The success of molfile and SMILES as de facto standards derives partly from the decision made by their authors to refrain from patenting their languages. As a result, developers are motivated build their own implementations, rather than invent yet another language.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Conclusions&lt;/h4&gt;

&lt;p&gt;A robust and modern line notation system is a key technology for chemically enabling the Web. Existing line notations, although useful in many contexts, were not designed with this particular role in mind. The time has come to consider whether a new line notation system, designed specifically with the Web and modern chemistry in mind, might offer a better solution.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Photo credit: &lt;a href="http://flickr.com/photos/wenwennie/"&gt;Wenwen&lt;/a&gt;  - &lt;a href="http://flickr.com"&gt;Flickr&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 14 Mar 2007 10:18:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:81f8ab71-4155-406b-adfa-2d1fde0c4f6b</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2007/03/14/eleven-qualities-of-the-perfect-line-notation-for-the-web</link>
      <category>Web</category>
      <category>inchi</category>
      <category>smiles</category>
      <category>iupac</category>
      <category>linenotation</category>
      <category>web</category>
      <category>uri</category>
    </item>
    <item>
      <title>Humanizing Line Notations</title>
      <description>&lt;p&gt;&lt;a href="http://depth-first.com/articles/tag/linenotation"&gt;Line notations&lt;/a&gt; are useful for encoding molecular structure with computers, especially in a network environment. Because line notations are compact and ASCII-based, they can, among other purposes, be used to &lt;a href="http://dx.doi.org/10.1039/b502828k"&gt;query popular Web search engines&lt;/a&gt; for chemical content on the web. Useful as line notations are for computers, they are not as useful to humans, who would much rather have a 2-D structure diagram to look at.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.daylight.com/daycgi/depict"&gt;Depict&lt;/a&gt; is an example of software that generates 2-D structure renderings from a SMILES string. Behind the scenes, the software parses the SMILES string, creates a connection table, determines 2-D coordinates for its atoms, and produces a raster image of the result. Software accomplishing the same task is also available from &lt;a href="http://demo.eyesopen.com/cgi-bin/depict"&gt;OpenEye&lt;/a&gt;. In this tutorial, you'll see one way to create free Depict-like functionality from Open Source tools.&lt;/p&gt;

&lt;h4&gt;The Ingredients&lt;/h4&gt;

&lt;p&gt;This tutorial uses Arton's &lt;a href="http://rjb.rubyforge.org/"&gt;Ruby Java Bridge&lt;/a&gt;, the installation and use of which has been outlined &lt;a href="http://depth-first.com/articles/2006/08/26/scripting-java-libraries-with-ruby-java-bridge"&gt;previously&lt;/a&gt;. In addition, you'll need to download &lt;a href="http://prdownloads.sourceforge.net/structure/structure-cdk-0.1.2.zip?download"&gt;Structure-CDK v0.1.2&lt;/a&gt;, also &lt;a href="http://depth-first.com/articles/2006/08/28/drawing-2-d-structures-with-structure-cdk"&gt;previously discussed&lt;/a&gt;. Be sure to download v0.1.2, as two upgrades have been released since the package was originally discussed.This tutorial has been tested on Mandriva Linux 2006.&lt;/p&gt;

&lt;p&gt;Create a working directory called &lt;strong&gt;depict&lt;/strong&gt;. From the &lt;strong&gt;lib&lt;/strong&gt; directory of the Structure-CDK distribution, copy &lt;strong&gt;cdk-20060714.jar&lt;/strong&gt; and &lt;strong&gt;structure-cdk-0.1.2.jar&lt;/strong&gt; into your &lt;strong&gt;depict&lt;/strong&gt; working directory.&lt;/p&gt;

&lt;h4&gt;The Code&lt;/h4&gt;

&lt;p&gt;Now create a file called &lt;strong&gt;depict.rb&lt;/strong&gt; and copy the following code into it:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="constant"&gt;ENV&lt;/span&gt;&lt;span class="punct"&gt;['&lt;/span&gt;&lt;span class="string"&gt;CLASSPATH&lt;/span&gt;&lt;span class="punct"&gt;']&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;./cdk-20060714.jar:./structure-cdk-0.1.2.jar&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require_gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rjb&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="constant"&gt;SmilesParser&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.smiles.SmilesParser&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="constant"&gt;StructureDiagramGenerator&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;org.openscience.cdk.layout.StructureDiagramGenerator&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="constant"&gt;ImageKit&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Rjb&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;import&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;net.sf.structure.cdk.util.ImageKit&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="class"&gt;Depictor&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;initialize&lt;/span&gt;
    &lt;span class="attribute"&gt;@smiles_parser&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;SmilesParser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
    &lt;span class="attribute"&gt;@sdg&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;StructureDiagramGenerator&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;depict_png&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_png&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="constant"&gt;ImageKit&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;writePNG&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smi_to_mol&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;),&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_png&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;depict_svg&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_svg&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="constant"&gt;ImageKit&lt;/span&gt;&lt;span class="punct"&gt;::&lt;/span&gt;&lt;span class="ident"&gt;writeSVG&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smi_to_mol&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;),&lt;/span&gt; &lt;span class="ident"&gt;width&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;height&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;path_to_svg&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;

  &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;smi_to_mol&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
    &lt;span class="attribute"&gt;@sdg&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;setMolecule&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="attribute"&gt;@smiles_parser&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;parseSmiles&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;smiles&lt;/span&gt;&lt;span class="punct"&gt;))&lt;/span&gt;
    &lt;span class="attribute"&gt;@sdg&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;generateCoordinates&lt;/span&gt;

    &lt;span class="attribute"&gt;@sdg&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;getMolecule&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

After you save this file, you'll need to set your &lt;tt&gt;LD_LIBRARY_PATH&lt;/tt&gt; on unix (or the equivalent on another OS):

&lt;div class="console"&gt;
&lt;pre&gt;
export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/i386:$LD_LIBRARY_PATH
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This tells RJB where to find Java's native libraries. Because of RJB's current design, &lt;tt&gt;LD_LIBRARY_PATH&lt;/tt&gt; needs to be set from the command line, rather than from within a Ruby process.&lt;/p&gt;

&lt;p&gt;Using the &lt;tt&gt;Depictor&lt;/tt&gt; class is simple. For example, to generate SVG and PNG images of desloratadine (Clarinex):&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;depict&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="ident"&gt;depictor&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;new&lt;/span&gt;

&lt;span class="ident"&gt;depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;depict_svg&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;Clc4ccc3C(=C1CCNCC1)c2ncccc2CCc3c4&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;desloratadine.svg&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
&lt;span class="ident"&gt;depictor&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;depict_png&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;Clc4ccc3C(=C1CCNCC1)c2ncccc2CCc3c4&lt;/span&gt;&lt;span class="punct"&gt;',&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="number"&gt;300&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;desloratadine.png&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;The Output&lt;/h4&gt;

&lt;p&gt;Running the above code, either with the Ruby interpreter (ruby) or with Interactive Ruby (irb) will produce an SVG and a PNG image in your &lt;strong&gt;depict&lt;/strong&gt; directory containing the 2-D structure of the popular antihistamine (see image below). Scalable Vector Graphics (SVG) format is a popular, XML-based vector graphics encoding system that can be viewed with the &lt;a href="http://www.mozilla.com/firefox/"&gt;Firefox browser&lt;/a&gt; and several other software packages.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/files/desloratadine.png"&gt;&lt;/img&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The code we've used here takes advantage of convenience methods in the Structure-CDK library. However, it is possible to customize the output in several ways, including line thickness, line spacing, color scheme, and atom label height by using the library's lower-level API.&lt;/p&gt;

&lt;p&gt;Being able to render a human-readable structure diagram from a line notation is useful in many situations. As you can see, this complex process can be accomplished quickly using Ruby, Java and open source chemical informatics libraries. Future articles will make use of this capability in building more complex chemical informatics systems.&lt;/p&gt;</description>
      <pubDate>Sat, 02 Sep 2006 17:08:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:e9b36791-dfcf-4de7-88a3-5e74fa22344f</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/09/02/humanizing-line-notations</link>
      <category>Graphics</category>
      <category>ruby</category>
      <category>java</category>
      <category>inchi</category>
      <category>linenotation</category>
      <category>smiles</category>
      <category>2d</category>
    </item>
    <item>
      <title>A First Look at Modular Chemical Descriptor Language (MCDL)</title>
      <description>&lt;p&gt;&lt;center&gt;&lt;img src="http://depth-first.com/files/mcdl_2001_1.png" /&gt;&lt;/center&gt;&lt;/p&gt;

&lt;blockquote&gt;
    &lt;p&gt;The Modular Chemical Descriptor Language (MCDL) was developed to address the need for linear representation of structural and other chemical information for chemical databases, E-journals and the Internet.&lt;/p&gt;

    &lt;p&gt;&lt;cite&gt;-Andrei A. Gakh and Michael N. Burnett &lt;a href="http://dx.doi.org/10.1021/ci000108y"&gt;J. Chem. Inf. Comput. Sci. 2001, 41, 1494-1499&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Molecular line notations reduce a molecular structure to a string of ASCII characters. This is helpful in a variety of situations: as a method of text-based structure input; as a compact representation that can be stored and transmitted over a network; and in some cases as a method for uniquely identifying a molecular structure. The development of line notations is one of the &lt;a href="http://octetsource.net/articles/2006/08/18/107-years-of-line-formula-notations-1861-1968"&gt;oldest&lt;/a&gt; pursuits in chemical informatics.&lt;/p&gt;

&lt;p&gt;MCDL has a lot in common with &lt;a href="http://www.iupac.org/inchi/"&gt;InChI&lt;/a&gt;. Both languages are modular in the sense that succeeding levels of structural complexity are represented by individual &#8220;modules&#8221; (MCDL) or &#8220;layers&#8221; (InChI): constitution; connectivity; and stereochemistry. Both languages sport free developer toolkits written in C (the InChI toolkit, and LINDES, respectively).  Interactive structure-drawing tools even exist for both languages (the interactive MCDL tool was &lt;a href="http://sourceforge.net/projects/mcdl"&gt;recently released&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;MCDL and InChI also differ in some significant ways. One of the biggest differences is that InChI separates hydrogen atoms and their parent atoms into separate layers, whereas MCDL places hydrogen atoms together with the atom to which they are attached. Another difference is in the approach to canonicalization. InChI uses a relatively &lt;a href="http://octetsource.net/articles/2006/08/12/inchi-canonicalization-algorithm"&gt;complex system&lt;/a&gt; not unlike that of &lt;a href="http://pubs3.acs.org/acs/journals/doilookup?in_doi=10.1021/ci00062a008"&gt;canonical SMILES&lt;/a&gt;. In contrast, MCDL uses a simpler system based on ASCII lexical ordering of atom types. On the non-technical side, InChI carries the endorsement of &lt;a href="http://www.iupac.org/"&gt;IUPAC&lt;/a&gt;, whereas MCDL is the work of independent developers.&lt;/p&gt;

&lt;p&gt;MCDL and InChI approach the problem of developing an internet-ready line notation from different angles. It will be interesting to see how each evolves.&lt;/p&gt;</description>
      <pubDate>Sat, 19 Aug 2006 20:04:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:cfebcd69-2080-4251-a3dc-8fbede550b5e</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/08/19/a-first-look-at-modular-chemical-descriptor-language-mcdl</link>
      <category>Tools</category>
      <category>lindes</category>
      <category>mcdl</category>
      <category>linenotation</category>
      <category>inchi</category>
    </item>
    <item>
      <title>107 Years of Line-Formula Notations (1861-1968)</title>
      <description>&lt;h1&gt;&lt;code&gt;L E5 B666 FVTJ A1 E1 OQ&lt;/code&gt;&lt;/h1&gt;

&lt;blockquote&gt;
    &lt;p&gt;Thus, within the short period of just seven years after the birth of structural chemistry in 1861, virtually all of the main ideas relating to line-formula descriptions were conceived and published. No basically new practices appeared for some 79 years. Then, within an identically brief period of just seven years (1947-1954), virtually all of the fundamental features of structure-delineating chemical notations appeared in the international chemical literature.&lt;/p&gt;

    &lt;p&gt;&lt;cite&gt;-William J. Wiswesser &lt;a href="http://dx.doi.org/10.1021/c160030a007"&gt;J. Chem. Doc. 1968, 8, 146-150&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Apparently, advances in chemical line notations have a history of occurring in clusters. Perhaps the development of &lt;a href="http://www.iupac.org/inchi/"&gt;InChI&lt;/a&gt; will spawn a renaissance in the development and use of line notations. Is there room (or need) for multiple line notation languages, each filling a particular niche, or can a universal line notation ever be developed? Will currently popular line notations such as &lt;a href="http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html"&gt;SMILES&lt;/a&gt; and InChI seem as cumbersome in 30 years as &lt;a href="http://www.amazon.com/gp/product/B0006CK33G/sr=8-1/qid=1155873116/ref=sr_1_1/102-3132294-7762550?ie=UTF8"&gt;Wiswesser Line Notation&lt;/a&gt; does today?&lt;/p&gt;</description>
      <pubDate>Fri, 18 Aug 2006 02:50:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:f84b221b-5b09-43b2-94d0-9fbcc379abd4</guid>
      <author>Rich Apodaca</author>
      <link>http://depth-first.com/articles/2006/08/18/107-years-of-line-formula-notations-1861-1968</link>
      <category>Tools</category>
      <category>wln</category>
      <category>linenotation</category>
      <category>smiles</category>
      <category>inchi</category>
    </item>
  </channel>
</rss>
