A Simple Vector Graphics API for Chemical Structure Output Part 1: In Search of a Simplifying Approach for ChemPhoto 3
One of the main design goals of ChemPhoto, the chemical structure imaging application, was to support all Web-relevant image output formats, both vector-based and pixel-based. Like most things in software development, there are far more approaches that add complexity to this problem than there are approaches that remove it. And for some reason, the complexity-reducing methods tend to be the last to be considered. This article, the first in a series, will discuss how ChemPhoto simplifies the problem of supporting multiple chemical structure image output formats from a common representation.
The Problem in a Nutshell
ChemPhoto uses an internal representation of molecular structure based closely on the industry-standard MDL molfile format. Given this representation, ChemPhoto needs to be able to write a variety of vector- and raster-based image formats. Raster formats are fortunately limited to PNG and JPG, which are supported directly by the standard Java library.
Vector formats, on the other hand are more diverse and less accessible. Currently, ChemPhoto supports Scalable Vector Graphics (SVG) and Encapsulated PostScript (EPS). Complete support for Adobe Flash (SWF) output is expected soon. Proof of concept for Microsoft's Vector Markup Language (VML) has already been demonstrated. Support for Adobe Acrobat format, through the iText library is anticipated. Last but not least is Java2D itself for use in Swing components such as renderers and editors.
Clearly, supporting all of these formats requires rendering code that is minimally coupled to the underlying display system. But how to do this in practice?
The Batik Approach: Extend Graphics2D
Batik is a widely-used library for creating and processing SVG in Java. At its core is the SVGGraphics2D class which extends Graphics2D, overriding many of its methods in the process. The idea seems simple enough - create your drawing code using the Java2D API like you normally would. When you want to generate SVG, just pass an instance of SVGGraphics2D and then read out the SVG document using stream method.
The problem with this approach is that every new image output format to be supported needs to extend Graphics2D and essentially re-implement most of its methods. Graphics2D is a large and complex class with many associated helper classes. Just knowing when you've completely covered the API is a major challenge, aside from the even bigger challenge of implementing the overridden methods.
Fine, you might say, given that so many SVG interconverters exist, why not just use SVG (created by Batik) as the universal interconversion format and get a third-party-library to convert SVG into other vector formats?
This approach is appealing in principle, but fails in practice. Many SVG implementations are partial at best - and many lack the documentation that would warn that a problem might exist with the exact form of SVG you're using. For example, in an early iteration of ChemPhoto, Batik was used to create SVG from some representative chemical structures. Unfortunately, the way Batik represented path data was not fully interpreted by any of the SVG->SWF converters that were examined. The result was bumpy instead of smooth curves for atom labels, and other unacceptable abnormalities.
Finally, after spending some time reading J. David Eisenberg's excellent book about SVG, it became clear that for drawing 2D chemical structures and even reactions and reaction schemes, only a fraction of the SVG specification was relevant.
In this case, Batik, and its approach of extending Graphics2D was simply overkill that made the problem more complex than it needed to be.
A Better Approach: Create a Custom Vector Graphics Interface
Batik has the right idea: isolate drawing code from the specific format being generated. The problem is that the Graphics2D class wasn't really designed for this purpose. For one thing, it's a concrete class that inherits from another concrete class. And as mentioned before, Graphics2D a very complex class with many dependencies.
How can we create a simple vector graphics API tailored to chemical structure image creation, which is easily re-implemented, and which works with the existing Java2D API?
Part 2 of this series will describe one approach.
Conclusions
Creating the ChemPhoto rendering engine has been an evolutionary process. It started with the idea of directly using the Graphics2D class in rendering code, but has since moved on to the definition of a vector graphics abstraction layer to simplify the addition of new image formats.
I'd like to thank those beta testers who have already offered valuable feedback on ChemPhoto. If you'd like an unlimited 30-day trial for yourself, please drop me a line.
Image Credit: estherase
Vector Markup Language for Cheminformatics 2
Vector Markup Language (VML, not to be confused with VRML) is the XML-based vector graphics system that has been present in Internet Explorer for the last eight years. Although not as widely-documented or well-known as its younger cousin, SVG, VML is a capable system with all of the features needed to create dynamic images on the Web client.
Nothing beats an example for discussing technologies. To clarify the modifications to the stylesheet and namespace required by VML, an example is given on a separate page:
Internet Explorer is the only major browser to support VML, so the image won't be visible on other clients. The image is hyperlinked to the Chempedia entry for Oxytocin.
This example's VML was generated with a development version of ChemWriter.
Adobe Flash for Cheminformatics: Fast, Scalable, and Attractive 2D Depiction of Chemical Structures with Vector Graphics 3
The previous article in this series discussed the use of vector graphics markup languages for cheminformatics, in particular for the display of 2D chemical structures. Although vector graphics are well-suited for creating responsive and appealing cheminformatics Web applications, the lack of universal native browser support makes both Scalable Vector Graphics (SVG) and its cousin Vector Markup Language (VML) unattractive at this time. This article highlights Adobe Flash as a 2D chemical structure renderer for Web applications, and features a fully-functional proof of concept based on the ChemWriter rendering engine.
About Adobe Flash
Although Adobe Flash is practically an industry unto itself today, at it's core, Flash is a lightweight vector graphics renderer. Introduced in 1996, the Flash Player can be found on millions of Internet-enable devices today. According to a study by Adobe, the Flash Player was running on nearly 99% of Internet-enabled desktops as of March 2008. The player has also found its way onto a host of handheld devices and phones.
Many technologies have been layered on top of the Flash Player. One of the first was the ActionScript scripting language. More recently, Adobe has introduced Flex, a full-fledged application development framework.
Unlike SVG and other vector graphics systems, Flash is ready today, proven, and about as close to universal as is possible on the Web. If you want to do vector graphics on the Web with the most convenient user and developer experience, Flash is your tool.
But what can Flash do for cheminformatics?
A Demonstration
The table below is composed of twelve cells, each of which display a chemical structure through the Flash Player.
| zoom | zoom | zoom |
| zoom | zoom | zoom |
| zoom | zoom | zoom |
| zoom | zoom | zoom |
Several points are worth mention:
Each of the structures can be zoomed by clicking on its 'zoom' link.
Each cell contains a lightweight embedded "SWF" file, or "ShockWave File," and the zoomed view displays exactly the same file. No matter how the SWF file is resized, it will always be proportionally-scaled to its smallest dimension and centered.
The size of each SWF file ranges from a low of 563 bytes to a high of 8.5 KB, with an average of around 1.5KB. The larger the molecule, the more space is required. A comparable PNG with a resolution of 150x150 pixels would require on average for each structure about 6-8 KB.
Each image was generated from a molfile using a development version of the ChemWriter rendering engine via the open source Transform SWF Java toolkit.
SWF Files, unlike applets, are highly optimized for multiple instance display on all major platforms and browsers. In every case, startup will be nearly instantaneous and scrolling will be smooth. The performance of Flash should be at least as good as, if not better than, raster images.
The Right Tool for the Job (is Probably not a Raster Image)
One of the first challenges developers of cheminformatics Web applications are faced with is how to render 2D chemical structures. For an overview of the technologies now in use, see the previous article in this series. Each option has its own set of trade-offs.
The most widely-used 2D structure rendering option, raster images, is both inflexible and inefficient. Unlike a vector image, a raster image by definition has only one resolution, which is fixed at creation time. If image dimensions need to change, then all structures must be re-imaged. Given the size of many of today's chemistry databases, such a system-wide re-imaging of structures can involve a non-trivial amount of processor power and bandwidth.
To compensate, many sites store relatively large images, say 300x300 pixel, and then use the HTML <img> tag to shrink it as needed. But this creates problems of its own: both storage and bandwidth requirements are far larger than they need to be, resulting in the need for more powerful server hardware and poorer application scalability. And then there are the application's users, who must wait through a 30KB or higher download for each 2D image.
A significant number of structures in any compound collection will be so large that even a 300x300 pixel image will be insufficient to render the necessary detail. For example, a recent Depth-First article discussed a vector graphics solution this problem within the context of Chempedia, the free chemical encyclopedia. Vector graphics simply eliminate this issue.
Many cheminformatics applications would benefit from being able to show 50 or more structures at a time, with each structure having a zoom view for closer inspection. To a non-chemist, this might seem unnecessary. But for today's chemists dealing with large chemical catalogs and high-throughput screens, it's not only possible, but a routine part of the practice of chemistry. The raster image approach makes it extremely difficult to meet this important need on the Web. Vector graphics, possibly delivered through the Flash Player, offer a much simpler and more efficient way to do it.
2D chemical structures are vectorial in nature; using raster images to depict them is in most cases the more costly and lower quality option.
Summary
Vector graphics are a near-perfect match for the job of depicting 2D chemical structures on the Web. Although there are many vector graphics platforms to choose from, the Flash Player is by far the most universal option. This article has demonstrated a working example of multiple 2D chemical structures rendered as lightweight vector images via the Adobe Flash Player, the first and only such demonstration of which I'm aware.
The key technologies behind this demonstration are the ChemWriter rendering engine and the open source Flash developer toolkits available from Flagstone Software. If you're interested in learning more about how vector graphics and Flash can improve both the user and developer experience in your cheminformatics Web applications, I'd be happy to hear from you.
The Other Vector Graphics Markup Language
Scalable Vector Graphics (SVG) is a technology that enables the creation and publication of high quality images that can be scaled to any resolution. SVG is ideally suited for the Web, and all major browsers now support it - except Internet Explorer (IE). This poses a problem: vector graphics are by far superior to raster images for many applications, but the lack of native IE support makes SVG a non-starter for most developers. This article discusses a little known IE capability that might provide a solution.
Oh Brother, Where Art Thou?
Way back in 1998 a group of companies including Microsoft submitted a proposal for a vector graphics language called Vector Markup Language (VML) to the W3C. This set in motion a series of events that culminated in the development of what we know today as SVG. But while use of SVG quickly expanded, VML remained almost exclusively limited to Microsoft products.
Soon after, IE 5 introduced the ability to decode and display VML - a capability that exists today in IE 7.
SVG and VML are two vector graphics languages, each designed to do essentially the same thing. For basic shape rendering, their similarities outweigh their differences.
About VML
To understand why VML never caught on, you need look no further than the documentation - or the lack thereof. The original VML submission is a decade old and has not been updated.
For the most part, VML documentation is scattered and incomplete. Nevertheless, there are some useful resources. Here, in no particular order are some of them:
Microsoft Documentation Authoritative, but lacking in examples.
VML, SVG, and Canvas Discusses some of the differences between VML and SVG.
Cum mortuis in lingua mortua Good history of VML.
Examples of the Vector Markup Lanugauge There are far too few of this kind of site.
VectorConverer A PHP library that uses XSLT to interconvert SVG and VML. Unfortunately, the stylesheet didn't work in my hands under Xalan or Ruby/xslt - and I know almost nothing about PHP.
Julie Nabong's Masters Thesis Julie wrote and documented an SVG/VML XSLT for interconverting the two languages.
JSDrawing: Interconverting Vector Languages on the Fly
One VML resource deserves special note - JSDrawing. This library seems to be capable of generating Flash, VML, or SVG from a common vector graphics language precursor. I'm not sure how practical this approach would be, but it does provide some food for thought.
Why It Matters
Chemistry is in a good position to take advantage of vector graphics. Chemical structures, being closely based on graph theoretical constructs, would seem to be a perfect match for vector languages like SVG and VML, especially on the Web. So far it hasn't happened, primarily for the reasons outlined above.
Currently, if you want to display 2D chemical structures in Web pages you're faced with some tradeoffs:
Raster Images. This is by far the most common practice. This option unfortunately makes it very difficult to redesign the layout of a site or support multiple views of the same structure, especially with databases of one million plus compounds becoming commonplace. Even if images are never regenerated, they need to be stored and retrieved, adding to cost and complexity. Images could be dynamically generated, but at the expense of substantial memory and CPU requirements.
Applets. This is the approach currently taken by Chempedia, the free chemical encyclopedia, and gives complete flexibility in page layout and structure appearance. Changing the dimensions of a structure is as simple as changing the size of a div. Unfortunately, some browsers handle multiple applets better than others. Firefox on OS X is very slow at refreshing applets while scrolling, and IE requires a Javascript trick to remove the 'click to active' message that causes some flashing when in progress.
Vector Graphics Through Plugins There are at least two SVG plugins for IE (one by Adobe and the other from Examotion). Will all of your users be able to find and install them? Unless the answer to both questions is 'yes', this option is probably best left as a last resort. Another option is to render SVG on IE through the Flash or Silverlight plugins. But as far as I can tell, neither approach is ready for prime-time.
Native Vector Graphics Available on all major browsers including Internet Explorer 5/6/7, Firefox 1/2, and Opera 8/9. Combines the flexibility, lossless depiction, inlineability and low data storage/retrieval overhead of applets with the speed of images. Interactivity and other special effects can be achieved through DOM manipulation. All of this depends, of course, on the vector graphics format being compatible with the rendering engine.
In some circumstances, serving VML to IE clients and SVG to everyone else would be a viable option - if it were possible to generate VML.
Conclusions
Vector graphics have a lot to offer chemistry, especially when used with Web applications. The combination of VML and SVG offers a proven technology platform that's ready today, but only if you can generate VML.

