Adobe Flash for Cheminformatics: Fast, Scalable, and Attractive 2D Depiction of Chemical Structures with Vector Graphics 3

Posted by Rich Apodaca Tue, 10 Jun 2008 11:05:00 GMT

The previous article in this series discussed the use of vector graphics markup languages for cheminformatics, in particular for the display of 2D chemical structures. Although vector graphics are well-suited for creating responsive and appealing cheminformatics Web applications, the lack of universal native browser support makes both Scalable Vector Graphics (SVG) and its cousin Vector Markup Language (VML) unattractive at this time. This article highlights Adobe Flash as a 2D chemical structure renderer for Web applications, and features a fully-functional proof of concept based on the ChemWriter rendering engine.

About Adobe Flash

Although Adobe Flash is practically an industry unto itself today, at it's core, Flash is a lightweight vector graphics renderer. Introduced in 1996, the Flash Player can be found on millions of Internet-enable devices today. According to a study by Adobe, the Flash Player was running on nearly 99% of Internet-enabled desktops as of March 2008. The player has also found its way onto a host of handheld devices and phones.

Many technologies have been layered on top of the Flash Player. One of the first was the ActionScript scripting language. More recently, Adobe has introduced Flex, a full-fledged application development framework.

Unlike SVG and other vector graphics systems, Flash is ready today, proven, and about as close to universal as is possible on the Web. If you want to do vector graphics on the Web with the most convenient user and developer experience, Flash is your tool.

But what can Flash do for cheminformatics?

A Demonstration

The table below is composed of twelve cells, each of which display a chemical structure through the Flash Player.

zoom zoom zoom
zoom zoom zoom
zoom zoom zoom
zoom zoom zoom

Several points are worth mention:

  1. Each of the structures can be zoomed by clicking on its 'zoom' link.

  2. Each cell contains a lightweight embedded "SWF" file, or "ShockWave File," and the zoomed view displays exactly the same file. No matter how the SWF file is resized, it will always be proportionally-scaled to its smallest dimension and centered.

  3. The size of each SWF file ranges from a low of 563 bytes to a high of 8.5 KB, with an average of around 1.5KB. The larger the molecule, the more space is required. A comparable PNG with a resolution of 150x150 pixels would require on average for each structure about 6-8 KB.

  4. Each image was generated from a molfile using a development version of the ChemWriter rendering engine via the open source Transform SWF Java toolkit.

  5. SWF Files, unlike applets, are highly optimized for multiple instance display on all major platforms and browsers. In every case, startup will be nearly instantaneous and scrolling will be smooth. The performance of Flash should be at least as good as, if not better than, raster images.

The Right Tool for the Job (is Probably not a Raster Image)

One of the first challenges developers of cheminformatics Web applications are faced with is how to render 2D chemical structures. For an overview of the technologies now in use, see the previous article in this series. Each option has its own set of trade-offs.

The most widely-used 2D structure rendering option, raster images, is both inflexible and inefficient. Unlike a vector image, a raster image by definition has only one resolution, which is fixed at creation time. If image dimensions need to change, then all structures must be re-imaged. Given the size of many of today's chemistry databases, such a system-wide re-imaging of structures can involve a non-trivial amount of processor power and bandwidth.

To compensate, many sites store relatively large images, say 300x300 pixel, and then use the HTML <img> tag to shrink it as needed. But this creates problems of its own: both storage and bandwidth requirements are far larger than they need to be, resulting in the need for more powerful server hardware and poorer application scalability. And then there are the application's users, who must wait through a 30KB or higher download for each 2D image.

A significant number of structures in any compound collection will be so large that even a 300x300 pixel image will be insufficient to render the necessary detail. For example, a recent Depth-First article discussed a vector graphics solution this problem within the context of Chempedia, the free chemical encyclopedia. Vector graphics simply eliminate this issue.

Many cheminformatics applications would benefit from being able to show 50 or more structures at a time, with each structure having a zoom view for closer inspection. To a non-chemist, this might seem unnecessary. But for today's chemists dealing with large chemical catalogs and high-throughput screens, it's not only possible, but a routine part of the practice of chemistry. The raster image approach makes it extremely difficult to meet this important need on the Web. Vector graphics, possibly delivered through the Flash Player, offer a much simpler and more efficient way to do it.

2D chemical structures are vectorial in nature; using raster images to depict them is in most cases the more costly and lower quality option.

Summary

Vector graphics are a near-perfect match for the job of depicting 2D chemical structures on the Web. Although there are many vector graphics platforms to choose from, the Flash Player is by far the most universal option. This article has demonstrated a working example of multiple 2D chemical structures rendered as lightweight vector images via the Adobe Flash Player, the first and only such demonstration of which I'm aware.

The key technologies behind this demonstration are the ChemWriter rendering engine and the open source Flash developer toolkits available from Flagstone Software. If you're interested in learning more about how vector graphics and Flash can improve both the user and developer experience in your cheminformatics Web applications, I'd be happy to hear from you.

The Other Vector Graphics Markup Language

Posted by Rich Apodaca Fri, 06 Jun 2008 14:39:00 GMT

Scalable Vector Graphics (SVG) is a technology that enables the creation and publication of high quality images that can be scaled to any resolution. SVG is ideally suited for the Web, and all major browsers now support it - except Internet Explorer (IE). This poses a problem: vector graphics are by far superior to raster images for many applications, but the lack of native IE support makes SVG a non-starter for most developers. This article discusses a little known IE capability that might provide a solution.

Oh Brother, Where Art Thou?

Way back in 1998 a group of companies including Microsoft submitted a proposal for a vector graphics language called Vector Markup Language (VML) to the W3C. This set in motion a series of events that culminated in the development of what we know today as SVG. But while use of SVG quickly expanded, VML remained almost exclusively limited to Microsoft products.

Soon after, IE 5 introduced the ability to decode and display VML - a capability that exists today in IE 7.

SVG and VML are two vector graphics languages, each designed to do essentially the same thing. For basic shape rendering, their similarities outweigh their differences.

About VML

To understand why VML never caught on, you need look no further than the documentation - or the lack thereof. The original VML submission is a decade old and has not been updated.

For the most part, VML documentation is scattered and incomplete. Nevertheless, there are some useful resources. Here, in no particular order are some of them:

JSDrawing: Interconverting Vector Languages on the Fly

One VML resource deserves special note - JSDrawing. This library seems to be capable of generating Flash, VML, or SVG from a common vector graphics language precursor. I'm not sure how practical this approach would be, but it does provide some food for thought.

Why It Matters

Chemistry is in a good position to take advantage of vector graphics. Chemical structures, being closely based on graph theoretical constructs, would seem to be a perfect match for vector languages like SVG and VML, especially on the Web. So far it hasn't happened, primarily for the reasons outlined above.

Currently, if you want to display 2D chemical structures in Web pages you're faced with some tradeoffs:

  1. Raster Images. This is by far the most common practice. This option unfortunately makes it very difficult to redesign the layout of a site or support multiple views of the same structure, especially with databases of one million plus compounds becoming commonplace. Even if images are never regenerated, they need to be stored and retrieved, adding to cost and complexity. Images could be dynamically generated, but at the expense of substantial memory and CPU requirements.

  2. Applets. This is the approach currently taken by Chempedia, the free chemical encyclopedia, and gives complete flexibility in page layout and structure appearance. Changing the dimensions of a structure is as simple as changing the size of a div. Unfortunately, some browsers handle multiple applets better than others. Firefox on OS X is very slow at refreshing applets while scrolling, and IE requires a Javascript trick to remove the 'click to active' message that causes some flashing when in progress.

  3. Vector Graphics Through Plugins There are at least two SVG plugins for IE (one by Adobe and the other from Examotion). Will all of your users be able to find and install them? Unless the answer to both questions is 'yes', this option is probably best left as a last resort. Another option is to render SVG on IE through the Flash or Silverlight plugins. But as far as I can tell, neither approach is ready for prime-time.

  4. Native Vector Graphics Available on all major browsers including Internet Explorer 5/6/7, Firefox 1/2, and Opera 8/9. Combines the flexibility, lossless depiction, inlineability and low data storage/retrieval overhead of applets with the speed of images. Interactivity and other special effects can be achieved through DOM manipulation. All of this depends, of course, on the vector graphics format being compatible with the rendering engine.

In some circumstances, serving VML to IE clients and SVG to everyone else would be a viable option - if it were possible to generate VML.

Conclusions

Vector graphics have a lot to offer chemistry, especially when used with Web applications. The combination of VML and SVG offers a proven technology platform that's ready today, but only if you can generate VML.

Molecular Style Sheets: Combining SVG and CSS

Posted by Rich Apodaca Fri, 20 Oct 2006 14:02:00 GMT

Cascading Style Sheets (CSS) are used by Web developers to modify the appearance of an HTML document without requiring changes to the document itself. This approach has become so popular because of the power it offers: developers can achieve a consistent and re-usable look by simply editing and/or copying a single document.

2-D molecular structures are like text documents in that context determines the best presentation style. For example, the way that a 2-D structure appears on a Web page, complete with atom color coding and anti-aliasing, may not be the best way for it to appear on a handheld device. Consider these use cases:

  • An online publisher may want to achieve a consistent "look" for their 2-D molecular graphics, regardless of who generated them. For portability, they want to avoid hard-coding the styling information into the software they use.

  • You want to be able to re-use the 2-D structures you've downloaded from a blog in your presentation. The appearance of these structures needs to match those you already have.

  • An online substructure query may return results to a user that have been highlighted to indicate the atoms and bonds where the query hit. The user may want to set his or her own highlight color, or disable the feature altogether.

Users of ChemDraw and software like it are probably familiar with its concept of styles. This is the right idea, although limited in practice. The main limitation is that these products are aimed at single users on desktop machines that are willing to do a great deal of manual work to achieve consistency. Something far more general and automated is going to be needed, and to my knowledge it does not yet exist.

Could the style sheet concept be applied to 2-D structure diagrams? It turns out that SVG may offer a solution. Just as the appearance of an HTML document can be styled with CSS, so too can the appearance of an SVG document.

A Demonstration: Highlighting Bonds

As a demonstration, we'll see how a style sheet can be used to highlight one of naphthalene's rings, possibly as a result of it being hit by a substructure search.

Consider the above 2-D structure of naphthalene, which was generated by Structure-CDK, the latest version of which can be downloaded here. The SVG that generated this image is shown below. I have edited the commented lines.

<?xml version="1.0" encoding="UTF-8"?>

<!-- Edit: a stylesheet -->
<?xml-stylesheet href="style_normal.css" type="text/css"?>
<!DOCTYPE svg PUBLIC '-//W3C//DTD SVG 1.0//EN' 'http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd'>

<svg fill-opacity="1" xmlns:xlink="http://www.w3.org/1999/xlink" color-rendering="auto" color-interpolation="auto" text-rendering="auto" stroke="black" stroke-linecap="square" stroke-miterlimit="10" shape-rendering="auto" stroke-opacity="1" fill="black" stroke-dasharray="none" font-weight="normal" stroke-width="1" xmlns="http://www.w3.org/2000/svg" font-family="&apos;Dialog&apos;" font-style="normal" stroke-linejoin="miter" font-size="12" stroke-dashoffset="0" image-rendering="auto">

  <defs id="genericDefs" />
  <g>
    <g text-rendering="optimizeLegibility" stroke-width="0.098" transform="scale(47.2947,47.2947) translate(0.049,3.1127)" stroke-linecap="round" stroke-linejoin="round">
      <line y2="-0.7" fill="none" x1="4.8497" x2="4.8497" y1="-2.1" />
      <path fill="none" d="M4.8497 -2.1 L3.6373 -2.8 M4.5581 -1.945 L3.6488 -2.47" />
      <path fill="none" d="M0 -2.1 L0 -0.7 M0.28 -1.925 L0.28 -0.875" class="hit" /> <!-- Edit: hit -->
      <line y2="-2.8" fill="none" x1="0" x2="1.2124" y1="-2.1" class="hit" /> <!-- Edit: hit -->
      <path fill="none" d="M4.8497 -0.7 L3.6373 0 M4.5581 -0.855 L3.6488 -0.33" />
      <line y2="0" fill="none" x1="0" x2="1.2124" y1="-0.7" class="hit" /> <!-- Edit: hit -->
      <line y2="-2.1" fill="none" x1="3.6373" x2="2.4249" y1="-2.8" />
      <path fill="none" d="M1.2124 -2.8 L2.4249 -2.1 M1.224 -2.47 L2.1333 -1.945" class="hit" /> <!-- Edit: hit -->
      <line y2="-0.7" fill="none" x1="3.6373" x2="2.4249" y1="0" />
      <path fill="none" d="M1.2124 0 L2.4249 -0.7 M1.224 -0.33 L2.1333 -0.855" class="hit" /> <!-- Edit: hit -->
      <line y2="-0.7" fill="none" x1="2.4249" x2="2.4249" y1="-2.1" class="hit" /> <!-- Edit: hit -->
    </g>
  </g>
</svg>

In the image of naphthalene rendered above, the stylesheet I used was blank. However, by applying the following one-line style sheet, I can significantly change the appearance of this image:

*.hit { stroke: red }

This line is known as a "class selector." A CSS-aware SVG renderer (such as Firefox), after loading this style sheet, will apply the red stroke styling to all elements containing the hit class attribute. The output, shown below, highlights one of the rings of naphthalene in red.

Interestingly, the SVG document itself says nothing about what color the hit class should be - that's left for the style sheet to determine. So by changing one line in the style sheet, I can change the appearance of the hit highlighting to purple, green, or aquamarine. And this applies not only to colors, but to line thickness, line shape, anti-aliasing, and a variety of other properties.

Another Demonstration: Global Line Thickness

It's also possible to globally affect the appearance of naphthalene with a simple style sheet. For example, the following style sheet changes the line thickness and all line colors of naphthalene:

* { stroke-width: 0.25; stroke: green; }

When the naphthalene SVG is rendered with this style sheet, the image shown below is produced. The "*" selector is a wildcard, applying to all elements in the SVG document. This version of the style sheet ignores our "hit" styling from the example above. The hit styling could easily be added back in by adding the appropriate class selector line to the CSS.

You may notice in the image above that the leftmost vertical line appears clipped on its left side. This is because the SVG output from Structure-CDK exactly aligns the left line border with the leftmost side of the image area. By thickening the lines with a style sheet, we've overrun the image area. This could be fixed by moving the SVG viewport to the left. But that's a subject for another time.

A Limitation

It will probably never be possible to modify the distance between parallel lines, as for example in multiple bonds, with the CSS approach. These distances are set in the coordinate attributes of the line and path elements, and are independent of styling.

Conclusions

Of course, we're just scratching the surface of what's possible with CSS and 2-D molecular structures. For example, the same principles outlined here could be used for atom coloring schemes and a variety of line and drawing properties. Various forms of interactive animation are even possible. Despite its limitations, SVG and CSS make a powerful combination for developing next-generation molecular rendering platforms.

The Chemically-Aware Web: Are We There Yet?

Posted by Rich Apodaca Wed, 13 Sep 2006 13:25:00 GMT

Recently, I wrote a tutorial on embedding 2-D molecular renderings into webpages as Scalable Vector Graphics (SVG). This tutorial also contained a small experiment on the current chemical informatics capabilities of the Web.

Here is a scenario from the near future: Joe is writing a review on Cephalosporin C that he wants to publish the modern way - directly to the Web. An entirely new concept in scientific publishing has started to take hold. Rather than submitting scientific articles to publishers, who then make hamburger out of them and strip authors of their rights to reproduce their own work, a new system in which journals simply aggregate content already on the Web is gaining momentum. Some journals specialize in only including the very best scientific Web content available, and so enjoy a prestige factor. It's still a peer review system, but with inversion of control. The trick for scientists is getting their work indexed, and so noticed, in the first place.

Joe just downloaded a new 2-D structure editor, FooChemPaint, that he heard can make the structure drawings in his review structure-searchable. Every chemist he knows is talking about a new free search engine called Haystac (Haystac Ain't Chmoogle) that lets them substructure-search the web. For some reason, you need to create your structures using FooChemPaint if you want your own documents to be included in the search results.

After Joe finishes drawing Cephalosporin C with FooChemPaint, he chooses the File->Save As... menu item. Instead of saving as a JPG or PNG like he's done with other software, he saves the image as SVG. He then embeds the SVG into his review using a procedure similar to the one I outlined previously.

From Joe's perspective, he hasn't done anything very new. But unknown to Joe, FooChemPaint has automatically inserted the InChI identifier of Cephalosporin C as metadata into his SVG document. This enables ordinary search engines such as Google to associate the InChI with his SVG. The best part is that the entire process is essentially invisible to Joe.

Haystac is a web application that presents users with an online structure editor for preparing molecular queries. When a structure query is submitted, Haystac searches its molecular database for matches. This database, in turn, was built by a web spider specifically designed to look for InChI identifiers, maybe with the help of Google's Web API. One of Haystac's records for the structure of Cephalosporin C points to Joe's review article.

Science fiction? Maybe. This is where the experiment comes in. Before I submitted the article on SVG, I manually annotated the SVG of Alprazolam with the corresponding InChI. The XML source can be viewed in Firefox by right-clicking on the SVG image and choosing This Frame->View Frame Source, or alternatively here. Below is a fragment of the XML:

<svg ...>
  <rdf:RDF
    xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs = "http://www.w3.org/2000/01/rdf-schema#"
    xmlns:dc = "http://purl.org/dc/elements/1.1/" >
    <rdf:Description about="http://depth-first.com"
      dc:title="InChI=1/C17H13ClN4/c1-11-20-21-16-10-19-17(12-5-3-2-4-6-12)14-9-13(18)7-8-15(14)22(11)16/h2-9H,10H2,1H3"
      dc:format="image/svg+xml"
      dc:language="en" >
      <dc:creator>
        <rdf:Bag>
          <rdf:li>Richard L. Apodaca</rdf:li>
        </rdf:Bag>
      </dc:creator>
    </rdf:Description>
  </rdf:RDF>

  <!-- etc. -->
</svg>

Today I searched for the title of my article in Google and found it. I then searched for the InChI in the SVG metadata and did not find it. Currently, a search of this InChI shows only one hit from the DrugBank database.

The experiment failed in its stated goal of getting the InChI of Alprazolam indexed by Google via the metadata in its SVG rendering. Was it the formatting of my RDF tags? Is metadata just indexed more slowly than other content? Does Google just ignore metadata to avoid keyword stuffing by Search Engine Optimization tricksters? Are embedded SVG documents ignored by Google altogether? Whatever the reason, the technical barriers to a system like this working today are very low and dropping rapidly.

Generating and Serving 2-D Molecular SVGs

Posted by Rich Apodaca Sat, 09 Sep 2006 14:45:00 GMT

A previous article showed some examples of 2-D molecular rendering using Scalable Vector Graphics (SVG) embedded in a web page. This article will outline some simple steps for generating these images and publishing them on the Web.

Prerequisites

This tutorial uses Structure-CDK, a CDK add-on library written in Java. You'll need to install Sun's JDK 1.4.2 or later (or an open source alternative). Although not required, Ant makes it easy to use Structure-CDK. You'll want to make sure that your browser is SVG-enabled.

Creating a 2-D Molecular SVG File

Methylenedioxymethamphetamine (MDMA)

An SVG image like the one shown above can be created with this sequence of steps:

  1. Download and unzip the current release of Structure-CDK.
  2. Move into the unzipped Structure-CDK directory and run the Structure Visual Testing Framework:
    $ cd structure-cdk-0.1.2
    $ ant vis
    
  3. From the File menu, choose Open... and use the file dialog to open a molfile. The molfiles directory contains some samples.
  4. Resize the image to taste and choose Save as SVG... from the File menu. This writes the SVG image to a directory and filename of your choice.

Viewing the SVG File

You now have several options for viewing the SVG file. One of the simplest is to open it with the Firefox browser. Another option is to open it with the excellent, free SVG editor Inkscape. From Inkscape, you can edit your image, apply any number of special effects from the mundane to the remarkable, and save the result to disk.

Deploying the SVG File on the Web

After uploading your SVG file to a blog or other site, you may have some additional configuration to do. Because the SVG MIME type is not configured by default on all servers, you may need to do so yourself.

After uploading my first set of SVG files to my server, I tried to view them in Firefox. Instead of seeing the expected image in the browser window, I got a dialog asking if I wanted to open it with Inkscape or save it to disk.

With the help of some documentation, I was able to track the problem down to my server, which was using the MIME type "image/svg-xml" instead of "image/svg+xml". The former is the obsolete SVG MIME type, which Firefox rejects. Internet Explorer equipped with Adobe's SVG plugin, on the other hand, accepts the obsolete MIME type, rendering SVG without presenting a dialog. Web-Sniffer, which decodes header information from HTTP responses, may be useful for debugging your server's MIME type configuration.

Having configured your server's SVG MIME type as "image/svg+xml", pointing your browser to your SVG file's URL will let you view it in its full, W3C-compliant glory.

Embedding the SVG File in HTML

There are a few options for embedding an SVG image in HTML. The most universally-applicable mechanism is the <embed> tag:

<html>
  <head></head>
  <body>

  <!-- document body -->

  <embed src="url-to-svg-file.svg" TYPE="image/svg+xml" width="400" height="400" />

  </body>
</head>

Embedding SVG into HTML carries some limitations. For example, you can't interact with the SVG DOM the way you can if the SVG is inlined, or placed directly into the HTML document itself. But that's a subject for another time.

Creating and deploying 2-D molecular images as SVG documents is a straightforward process, provided that some details are taken care of. Future articles in this series will show how SVG's advanced features make it a compelling choice as a chemical informatics rendering platform.

Note: if your're viewing this article in a feed aggregator, the SVG images may have been stripped out. If so, please see the original article.

Older posts: 1 2