A Simple Vector Graphics API for Chemical Structure Output Part 1: In Search of a Simplifying Approach for ChemPhoto 3
One of the main design goals of ChemPhoto, the chemical structure imaging application, was to support all Web-relevant image output formats, both vector-based and pixel-based. Like most things in software development, there are far more approaches that add complexity to this problem than there are approaches that remove it. And for some reason, the complexity-reducing methods tend to be the last to be considered. This article, the first in a series, will discuss how ChemPhoto simplifies the problem of supporting multiple chemical structure image output formats from a common representation.
The Problem in a Nutshell
ChemPhoto uses an internal representation of molecular structure based closely on the industry-standard MDL molfile format. Given this representation, ChemPhoto needs to be able to write a variety of vector- and raster-based image formats. Raster formats are fortunately limited to PNG and JPG, which are supported directly by the standard Java library.
Vector formats, on the other hand are more diverse and less accessible. Currently, ChemPhoto supports Scalable Vector Graphics (SVG) and Encapsulated PostScript (EPS). Complete support for Adobe Flash (SWF) output is expected soon. Proof of concept for Microsoft's Vector Markup Language (VML) has already been demonstrated. Support for Adobe Acrobat format, through the iText library is anticipated. Last but not least is Java2D itself for use in Swing components such as renderers and editors.
Clearly, supporting all of these formats requires rendering code that is minimally coupled to the underlying display system. But how to do this in practice?
The Batik Approach: Extend Graphics2D
Batik is a widely-used library for creating and processing SVG in Java. At its core is the SVGGraphics2D class which extends Graphics2D, overriding many of its methods in the process. The idea seems simple enough - create your drawing code using the Java2D API like you normally would. When you want to generate SVG, just pass an instance of SVGGraphics2D and then read out the SVG document using stream method.
The problem with this approach is that every new image output format to be supported needs to extend Graphics2D and essentially re-implement most of its methods. Graphics2D is a large and complex class with many associated helper classes. Just knowing when you've completely covered the API is a major challenge, aside from the even bigger challenge of implementing the overridden methods.
Fine, you might say, given that so many SVG interconverters exist, why not just use SVG (created by Batik) as the universal interconversion format and get a third-party-library to convert SVG into other vector formats?
This approach is appealing in principle, but fails in practice. Many SVG implementations are partial at best - and many lack the documentation that would warn that a problem might exist with the exact form of SVG you're using. For example, in an early iteration of ChemPhoto, Batik was used to create SVG from some representative chemical structures. Unfortunately, the way Batik represented path data was not fully interpreted by any of the SVG->SWF converters that were examined. The result was bumpy instead of smooth curves for atom labels, and other unacceptable abnormalities.
Finally, after spending some time reading J. David Eisenberg's excellent book about SVG, it became clear that for drawing 2D chemical structures and even reactions and reaction schemes, only a fraction of the SVG specification was relevant.
In this case, Batik, and its approach of extending Graphics2D was simply overkill that made the problem more complex than it needed to be.
A Better Approach: Create a Custom Vector Graphics Interface
Batik has the right idea: isolate drawing code from the specific format being generated. The problem is that the Graphics2D class wasn't really designed for this purpose. For one thing, it's a concrete class that inherits from another concrete class. And as mentioned before, Graphics2D a very complex class with many dependencies.
How can we create a simple vector graphics API tailored to chemical structure image creation, which is easily re-implemented, and which works with the existing Java2D API?
Part 2 of this series will describe one approach.
Conclusions
Creating the ChemPhoto rendering engine has been an evolutionary process. It started with the idea of directly using the Graphics2D class in rendering code, but has since moved on to the definition of a vector graphics abstraction layer to simplify the addition of new image formats.
I'd like to thank those beta testers who have already offered valuable feedback on ChemPhoto. If you'd like an unlimited 30-day trial for yourself, please drop me a line.
Image Credit: estherase
Your Favorite Chemical Spreadsheet 7
On a recent article describing ChemPhoto, Oleg Ursu commented:
This looks a lot like Marvin View from ChemAxon (I am not sure about the details), which I have been using a lot lately.
Although the appearance is similar, the purpose is different. MarvinView is in a category of desktop cheminformatics software called "chemical spreadsheets." A chemical spreadsheet displays multiple chemical structures in a regular arrangement (typically grids or rows) for the purpose of manipulating them in the same ways as text and numerical data can be manipulated in spreadsheet applications like Excel (sorting, performing calculations, charting, editing, etc.).
Although ChemPhoto uses the grid metaphor to display structures, it's not a spreadsheet. It doesn't do analysis. It doesn't even allow editing of the underlying document. It's sole purpose is to display large numbers of structures for the purpose of modifying their appearance and imaging them. ChemPhoto assumes you've created an SD File with your chemical spreadsheet (or Web service) of choice.
But Oleg's comment got me thinking again about chemical spreadsheets.
Back when I started out as a medicinal chemist in 1999, there were few chemical spreadsheets. Today, the category has become very crowded. Between my own personal experience and just a little bit of research, I've come up with ten different chemical spreadsheets. They are, in no particular order:
MarvinView Cross platform and free for evaluation.
Seurat Cross-platform database connectivity and SAR analysis.
Bioclipse The only open source member on this list. Project-centric perspective.
DIVA The first chemical spreadsheet that I actually enjoyed using. Doesn't seem to be actively marketed anymore.
Accord An Excel plug-in that, at least in its earlier versions, was very cumbersome.
Third Dimension Explorer The most feature-rich chemical spreadsheet I've used. Support for many forms of sophisticated visual analysis geared specifically toward drug-discovery programs. Developed in-house at Johnson & Johnson.
Spotfire Feature-rich but difficult to learn. Cheminformatics built on top of a general purpose analysis engine.
ICM Don't know much about this one.
quattro/DS The name sounds similar to the Quattro Pro spreadsheet, but as far as I can tell, this is the only connection.
ISIS Spreadsheet functionality focused on creating and maintaining databases.
I'm sure this list is incomplete. If you know of others, please feel free to post a link. If you have a favorite (or least-favorite) chemical spreadsheet, what is it and why?
Imaging Chemical Structures with ChemPhoto: WYSIWYG Drawing Settings 4
Depending on the audience and medium, chemical structures can be presented in a variety of styles. Chemical structure imaging applications should make it easy to visually and/or numerically arrive at the best appearance. ChemPhoto makes it easy to get exactly the right look for your structures through what-you-see-is-what-you-get (WYSIWYG) drawing settings.
The screenshots below illustrate the three main categories of drawing settings in ChemPhoto: Atoms; Bonds; and Images. As each setting is manipulated, the entire view is updated in real-time to reflect the changes. A set of changes can be rolled back by pressing the "Cancel" button, making it easy to undo unwanted modifications.
Turquoise Theme with Atoms Tab

Console Theme with Bonds Tab

Blueprint Theme with Images Tab

To try an alpha version of ChemPhoto for yourself, drop me a line.
ChemPhoto Screenshots: Appearance of Structures and Browsing Large Collections 3
Chemical structure imaging software solves the problem of how to easily create large numbers of readable chemical structures in a variety of formats automatically. ChemPhoto was recently introduced as what appears to be the first chemical structure imaging application. With development of the ChemPhoto user interface now in full-swing, it's possible to show some screenshots.
Below are two screenshots in which around 25,000 structures from PubChem have been loaded.


If you're interested in trying an alpha version ChemPhoto, feel free to drop me a line.

