Introducing MX: Lightweight and Free Cheminformatics Tools for Java 5
If you want to build cheminformatics software of any kind, you'll need a basic toolkit. Ideally, this toolkit contains all of the low-level functionality used over and over in your projects. Tools for building an in-memory molecular representation, exact- and substructure comparison, and reading/writing molfiles all fall into this category. Also ideally, this toolkit should be free. Not free in the sense of free to use if you work at a university, free to try, or even free to use provided that you make your changes public when you redistribute the toolkit. But free in the sense of "do whatever you want with it and all you have to do is include a copyright notice."
This article introduces MX, a suite of lightweight and free cheminformatics tools for Java designed to fill these needs.
Download
A Google Code page has been set up for MX. Both a source distribution and compiled jarfile representing MX in its current state can be downloaded.
A subsequent article will show how to get started with MX.
Origins: A Chemical Structure Editor for Web Applications
In 2007 my company, Metamolecular, set out to build a lightweight and easy-to-use chemical structure editor for Web applications. Realizing the increasing importance the Web would play as a chemical communication medium in the next decade, a truly Web-based, platform-independent alternative to ChemDraw and ISIS/Draw seemed to be a good direction to pursue. The resulting product became known as ChemWriter.
Minimizing deployment footprint was a key consideration with ChemWriter; the last thing a chemist using a Web site wants to spend his or her time doing is waiting around for a large applet to download. With many chemical structure editor applets available today, download times on the order of one minute or longer are not uncommon. This is simply unacceptable.
To create ChemWriter, an ultra-lightweight cheminformatics toolkit was need. How lightweight? We were targeting 100 KB for the complete editor. A good chemical structure editor is a fairly complex piece of UI software involving multiple drawing tools with state-dependent behavior, not to mention some fairly sophisticated vector graphics rendering and molfile input/output. The only way we could reach our 100 KB target for ChemWriter is if the basic cheminformatics toolkit were 20 KB or smaller.
At the time, there was no cheminformatics toolkit, free or otherwise, that could fill the need.
So it was created from scratch.
High Performance in ChemPhoto
Eventually, the same cheminformatics toolkit used in ChemWriter was adapted for ChemPhoto, the chemical structure imaging application.
ChemPhoto was designed to dynamically display 100,000 or more 2D chemical structures in a grid-like GUI using minimal memory. Rather than pre-loading all 100,000 molecule objects into memory, which would not be feasible on most systems, ChemPhoto uses a lazy approach in which an in-memory index of the target SD file is built. Every time a new structure needs to be displayed to the user during a scrolling event, it's created from scratch: the molfile text is loaded from disk, a molecule object is created, the molecule is rendered, and then the entire construct is thrown away.
The performance of ChemPhoto was so good, even though everything was being created on demand and immediately thrown away, that it appeared the cheminformatics toolkit being used had potential in high-performance situations as well.
Substructure Search and Mapping
Recently, Rajarshi Guha reported his port of the VF library to Java for use with the Chemistry Development Kit (CDK). This began a thought process starting with "how can it be improved" and leading to the conclusion that the creation of flexible, Java-centric substructure search utilities would offer the most bang for the buck. A subsequent article described a simple strategy that could be used to get there.
To implement this idea, a cheminformatics toolkit was needed. The one used successfully in ChemWriter and ChemPhoto was an ideal candidate.
The result, a complete substructure search and mapping utility built from scratch, is available in MX under the package com.metamolecular.mx.map.
Free to Use Anytime, Anyplace - No Strings Attached
Licenses can be a problem with nearly all open source cheminformatics toolkits. If your work is mostly done in an academic environment for free, you're likely to experience no problem at all. However, if you run a company that sells licenses to software containing code you'd rather not reveal to the world, the reciprocity provisions in licenses such as those in the GPL, Mozilla (MPL), and IBM (CPL) families lead to major problems.
The problem isn't so much the open source license itself - it's the fact that the original copyright owners either won't give their permission to dual-license their contributions, or in many cases, can't even be tracked down to ask.
This is an unacceptable position for a software distributor wanting to use open source as a cost-effective means to boost their developer productivity.
To address these issues, MX is being distributed under the extremely permissive MIT License. In a nutshell it says you are free to modify and incorporate MX into any software you distribute without any obligation to release a line of your own source code. It also says if MX doesn't do the job, you're on your own. And that's about all it says. Your only obligation is to include the original copyright notice on all copies or substantial portions of the software.
To my knowledge, only one major cheminformatics toolkit is licensed under an academic-style open source license - RDKit, which is licensed under the New BSD License.
Conclusions
A basic cheminformatics toolkit is a vital component of most chemistry-related software. For maximal cost-effectiveness as a software distributor, a free toolkit licensed under a permissive open source license is ideal. MX is a free and lightweight cheminformatics toolkit written in Java that has been used successfully in two commercial products.
Future articles will describe the many ways MX can be used and extended.
JavaScript for Cheminformatics 8
Regular readers of this blog know that one of its recurring themes is the convergence of Web technologies and chemical information. Although several articles describe how Ruby and Java can be applied to cheminformatics, one language has never been featured: JavaScript. JavaScript is both the default language of the Web client, and a language of growing importance elsewhere. This article, the first in a series, introduces JavaScript as a tool for creating rich, chemically-oriented Web applications.
Surely You Jest
If your last exposure to JavaScript was in the 1990s, a lot has changed for the better. Most importantly, performance has increased considerably, making possible JavaScript software of considerable complexity. A growing collection of well-crafted libraries such as Prototype and Scriptaculous now make it possible to focus precious developer effort at a higher level of abstraction. Although cross-browser compatibility continues to be an issue with browser-specific features, the JavaScript language itself is now remarkably stable and consistent across browsers and platforms.
JavaScript will obviously never enjoy the performance of Java or C++, and it would be a mistake to assume otherwise. The key is to focus on what JavaScript can do that no other language or platform can. With this thought in mind, it's interesting to speculate on the role JavaScript could play in developing Web-based software for chemistry.
For example, with essentially complete read-write access to the document object model (DOM) of sites on which they're active, tools based on JavaScript have great potential to enhance static content with content created either on a server or locally.
As another example, consider the combination of JavaScript and an invisible Java applet. Java-JavaScript communication is possible in all browsers through LiveConnect, which can be used to offload computation-intensive operations from JavaScript to an applet. In many ways, this approach to development on the client resembles the approach, discussed here many times, of using Java from Ruby through Ruby Java Bridge (RJB) or JRuby.
Ultimately, JavaScript is the only programming language that allows cross-browser, client-side software to be written independently of a plugin. This may not sound like a big deal, but for many developers and organizations, Flash, Java, and other plugin technologies are less than ideal.
What's Known?
There are currently only scattered examples of the use of JavaScript in chemistry, an indication that either the field is ready for this idea, or that it will never work.
Perhaps the two best-known examples of JavaScript applied to chemical informatics are the WebME and PubChem chemical structure editors. Although remarkable accomplishments, both packages rely heavily on server back-ends for processing and analysis of chemical structures. The JavaScript code serves mainly to asynchronously pass low-level mouse events to the server, which then asynchronously passes a raster image back to the client. To an extent, the back and forth degrades the responsiveness of these tools.
The Blue Obelisk maintains a Wiki page discussing some of the uses of Greasemonkey scripts in chemistry. Greasemonkey is a wonderful tool for augmenting existing Websites, and these scripts do some remarkable things, but they don't fall into the category of cross-browser, installation-free, general-purpose tools.
Recently, Robert Hanson discussed on-demand Javascript as a potential tool for building chemistry-oriented Web applications. Although the proposal contains many important points, it has to my knowledge not been followed up with the release of code.
An early compilation of resources can be found here, although all of the listed sites have since either disappeared or converted to other content.
Poking around the Web, it's possible to find many examples of one-off chemistry tools created in JavaScript. For example, this page contains a JavaScript-enabled periodic table. Useful as tools like this may be, they're not general-purpose solutions designed to create a platform for further work.
What Would Be Most Valuable?
There are many cheminformatics tools that could be built in JavaScript. But to be of the greatest value, a tool should be usable in a variety of contexts. And it should serve as a platform on which more complex software can be built. Even more importantly, the tool should fix something that really is broken.
We might begin by asking: what makes chemical information difficult to work with in the greatest number of cases?
Since the earliest days of chemistry and computers, it has been clear that one of the distinguishing characteristics of chemical information is the central role played by chemical structures and the difficulty in accurately representing and processing them on machines.
So, one answer to the question of what would be the most useful JavaScript tool for cheminformatics could be: a low-level cheminformatics toolkit that understands chemical structures and their associated graph operations.
Another possibility: the handling of NMR, IR, UV, and mass spectra. JSpecView is a good first choice, but it may be possible to build a pure JavaScript tool for interactively viewing and manipulating spectra, if a low-level toolkit for processing JCAMP-DX were available.
Would the performance and usability of such toolkits be high enough to make them a serious choice for use in chemistry-driven Web applications? What would the availability of such a toolkit enable that is currently difficult or impossible?
Conclusions
JavaScript is the default programming language of the Web client, offering a great deal to creators of chemically-oriented applications. One of the biggest barriers to using JavaScript for this purpose is the lack of key developer tools. Future articles will explore this idea.
Image Credit: johnmuk

