InChI Spam
Do you remember when getting email - any email - was exciting? For me, that time was 1995 and I had just found the Internet. Of course, I remember looking forward to messages from people I knew. But I also remember being blown away by the idea that I could write to anyone with an email account, anywhere in the world for essentially free - and that they could do the same. Back then, it was fun to get email, no matter what the source.
Today, spam is something that I, like millions of others, deal with on a daily basis. And it's not limited to email. Anyone who runs a blog knows about comment spam and how difficult it can be to eradicate it. Even trackback is being used as a medium for blog spam. Of course, keyword Spam on the Web has been a constant problem for search engines - eliminating it has in part led to more than a few fortunes earned at companies like Google.
Recently, I introduced a small Web application called InChIMatic. It lets you conveniently do exact-structure molecular queries thorough popular search engines like Google. Draw your structure, click "Search" and find your matches.
There aren't a lot of InChIs visible to search engines now, as an InChIMatic query for even the most trivial molecule will reveal. Regardless of you views on InChI as a technology for bringing chemistry to the Web, it seems very likely that the number of InChIs visible to search engines will increase significantly over the next few years. And with this increase may come sites dedicated to nothing other than publishing a lot of irrelevant InChIs in the hope of attracting accidental advertising click-throughs.
Right now, searching the Web by InChIs offers a very high signal-to-noise ratio experience - not unlike email in 1995. The shysters haven't yet discovered it and nobody is counting on the technology for mission-critical work. But if and when the idea of indexing chemical content on the Web through InChIs begins to catch on, filtering tools will become essential. If this scenario seems implausible, think back to your first experience with email and how concerned you were about spam then.
Photo Credit: cobalt123
Google for Molecules with InChIMatic

InChIMatic is a simple Web application that uses Google to perform exact structure searches on the Web. After drawing your structure in the editor window, click the "InChI!" button to get a link. This link takes you to a Google query that displays matches for your molecule. You'll need both Java and JavaScript enabled in your browser to use InChIMatic.
The Technical Details
The technology at the heart of InChIMatic is the IUPAC International Chemical Identifier (InChI). An InChI is an alphanumeric string that uniquely identifies a molecular structure. By converting molecular structures to text, InChI makes it easy to use standard Internet tools to do exact structure searches.
The earliest reference in the peer-reviewed literature to using Google for searching InChIs is contained in a 2005 paper. More recently, a service called QueryChem has taken this idea one step further by using the Google API to perform substructure searches based on InChI.
InChIMatic works differently. Unlike a raw Google search, InChIMatic builds a Google query link for you. Unlike QueryChem, InChIMatic doesn't use the Google API and so has none of its restrictions. This does result in a limitation: InChIMatic can only currently be used to for exact structure queries.
The InChIMatic Web application has been discussed in greater technical detail in a previous article. The rapid Web application development framework Ruby on Rails made building InChIMatic a snap. InChIMatic is served by the Ruby application container Mongrel, which is hosted on a Linux server running Apache. Rino provided the Ruby interface to the IUPAC/NIST InChI toolkit. The 2-D structure editor is Java Molecular Editor (JME) by Peter Ertl, which is used with his kind permission.
Aside from JME, all components of InChIMatic, from the operating system it runs on to the InChI system itself, are Open Source software.
Using InChI to Raise the Visibility of Your Content
InChIMatic returns many Google results for common molecules. But less common, known molecules return no hits at all. Three factors are responsible: (1) Google doesn't index all InChIs on the Internet; (2) few content providers currently use InChI; and (3) there is no standard and convenient mechanism to embed InChIs into Web pages for indexing by Google.
For these reasons, I consider InChI to be bleeding edge technology. Some will find it useful, most will not. Unfortunately, this state of affairs will persist until problems (1) and (3) are solved.
Nevertheless, if you're technically adventurous, InChIMatic offers a relatively painless way to begin incorporating InChIs into your content and verifying that they get indexed. There's no software to download, install, or upgrade. Forget about operating system incompatibilities (hopefully!). Just point your Java-enabled browser to inchimatic.com.
Although there's no standard method to encode InChIs in Web pages, some interesting ideas have been put forward. Egon Willighagen has proposed a system based on RDFa. Future iterations of InChIMatic may include support for generating scripts and/or markup for including InChIs into blogs and other online content.
Conclusions
InChI is a complex new technology in need of easy-to-use tools. InChIMatic is one such tool that makes it possible to perform exact structure queries using Google.
One of the exciting things about Web applications is how quickly they can evolve. If in trying out InChIMatic you find something you'd like changed or added, please feel free to write me.
Anatomy of a Cheminformatics Web Application: InChIMatic
InChI is an open molecular identifier system. Although InChIs obviate the need for a central registration authority, they are complex enough that they must be generated by computer. Currently, a few desktop molecular editors can generate InChI identifiers. But wouldn't it be more convenient if this capability existed in a simple Web application that could be used from any computer - anywhere? This article describes a Web application called "InChIMatic", which does just that.
In this article, I'll show how Java Molecular Editor (JME), a lightweight 2-D structure editor, can be extended to produce InChI identifiers through server-side software written in Ruby, rather than by extending the applet with Java code.
Downloads and Prerequisites
InChIMatic requires Ruby on Rails and the Rino InChI toolkit. Both of these libraries can be installed using the RubyGems packaging system.
The complete InChIMatic source package can be downloaded from RubyForge. For convenience, a copy of JME is included with the distribution. The author, Peter Ertl, has kindly given permission for the bundled JME applet to be used with InChIMatic. For other uses, consult the JME homepage.
Running InChIMatic
$ cd inchimatic-0.0.2 $ ruby script/server
Pointing your browser to http://localhost:3000/inchi/input, drawing a structure in the JME window, and pressing the "InChI!" button will produce the corresponding InChI in the area below.

Behind the Scenes
The JME applet itself provides no capabilities for generating InChI identifiers. This functionality is instead provided by the Rails server via the Rino InChI library.
Let's say Susan wants to get the InChI for 3,4-dichlorophenol. After entering the structure into the JME window, she presses the "InChI!" button. This sets in motion the following sequence of events:
The JavaScript function writeMolfile() is called. This retrieves a molfile representation of 3,4-dichlorophenol from JME, which is then written to to the hidden field molfile.
A Rails listener notices that the hidden text field has been updated and so invokes the InChIMatic ajax_inchi action. This is a Rails Ajax action that will update only a portion of the InChIMatic window. For more detail on this Rails Ajax technique, see the previous Anatomy of a Cheminformatics Web Application article.
The ajax_inchi action retrieves the contents of the hidden molfile field. This molfile is then used to generate an InChI using Rino. This InChI is then saved to the instance variable inchi.
The contents of the InChIMatic area partitioned by the results div are then updated with the InChI obtained in Step 3. The JME applet itself is unaffected by this operation, allowing Susan to further elaborate her molecule, if she'd like.
So What? Re-Thinking the Role of Applets
JME is, by itself, incapable of generating InChIs. Yet InChIMatic provides this capability as if it existed natively. In other words, a lightweight, fast-loading, and responsive 2-D editor can be extended on the server side, rather than on the client side. The difference is imperceptible to the user, but ripe with potential for the developer.
One of the most common, and completely valid, complaints about Java applets is that they take too long to load. Offloading some of the functionality currently being bundled in applets onto a Web server offers one way to combat the problem. Furthermore, combining Java applets with Ajax and powerful Web application frameworks like Ruby on Rails offers virtually limitless opportunities to re-think the role of Java applets in Web application development.
Conclusions
JME's strength comes, perhaps ironically, from its limited functionality. By using some simple Web programming techniques, JME can be extended with server-side programming. The advantages, compared to extending the JME applet itself with Java on the client side, are significant. Future articles in this series will explore some of the possibilities.
Older posts: 1 2

