Smarter Cheminformatics: From SD File to Image Collection with ChemPhoto 2

Posted by Rich Apodaca Mon, 08 Sep 2008 19:04:00 GMT

The old adage says time is money. Unfortunately, working chemists are often forced to spend a remarkable amount of valuable time and mental effort on menial chemical information processing tasks. These are things that could be done faster and with better quality by the right software, if it were available. Most importantly, these tasks take resources away from much more valuable work that can't be automated.

The Problem in a Nutshell

As a case in point, consider the creation of 2D chemical structure images. If you maintain a compound collection of any kind, sooner or later you may end up asking yourself how you can create a set of images depicting the chemical structures in your collection.

A Specific Example: Chemical Suppliers

For example, you might work for a chemical supplier that maintains a Web-based eCommerce site, one or more PDF catalogs, or printed brochures. Your customers are chemists and they expect to see chemical structures in your product listings. How can you make this happen?

If you look around for software that automates this job, you'll more likely than not come up empty-handed. The software that solves this problem well simply doesn't exist yet.

Doing it the Hard Way

In the absence of software to solve the problem, the only way to get the job done is to buckle down and do it manually. Most chemical structure editors allow you to save output as a raster image. Provided that this output matches your requirements, your system might consist of the following steps:

(1) For every product in your catalog, create a single molfile or its machine-readable equivalent.

(2) Load one file into your editor.

(3) Save the file as a raster image, being careful to make sure all of the drawing settings and image size parameters are identical to the rest of your images.

(4) Repeat Steps (2)-(3) until you have all of your images.

There are many problems with this approach. For example, if your images ever need to be made larger (or smaller), you'll have to create all of your images over again (which can easily number in the thousands). Similarly, if for some reason you want to change the appearance of the images such as background, atom label coloring, or line thicknesses, you'll be forced into a lot of manual work. Finally, this system requires you to keep track of structures that have been imaged and those that haven't, which can in itself be nontrivial and error-prone, especially for thousands of products.

With the right software, this problem would disappear.

One Solution: Customized Imaging Service

My company, Metamolecular, has recently provided custom imaging services to a few chemical suppliers wanting thousands of good-looking structure images rendered automatically. The service made use of the versatile ChemWriter rendering engine together with some custom code written in Ruby.

Although the imaging service works very well as a one-off solution, it's less than optimal in the longer term. Any changes to the image collection must be processed by Metamolecular, and then sent back to the client. A cheaper and faster solution would be to offer software that implements the functionality of the service.

A Better Solution: Chemical Structure Imaging Software

Wouldn't it be great if easy-to-use software existed that could automatically generate thousands of chemical structure images with the press of a button?

In particular, the software should:

  • Run on any modern platform (Windows, Mac OS X, Linux).

  • Read industry-standard Structure Data Files (SD Files).

  • Be capable of working with tens of thousands of chemical structures at a time even on older machines.

  • Store fully-customizable drawing settings in a format that could be used over and over again for a consistent and professional look.

  • Allow the output to be previewed exactly as it will appear in the generated images ("what you see is what you get").

  • Output to a variety of image formats, including: Portable Network Graphics (PNG image); JPG image; Flash (SWF file); Scalable Vector Graphics (SVG); and Encapsulated PostScript (EPS file).

Introducing ChemPhoto

ChemPhoto is designed to solve the problem of consistently creating large numbers of high-quality 2D chemical structure images. Currently in development, the first versions of ChemPhoto will be available for review within the next two weeks.

ChemPhoto consists of a lightweight and intuitive user interface layer built on top of the ChemWriter rendering engine. ChemPhoto focuses on doing one thing very well, so it wouldn't be useful for creating or editing SD Files (a task for which many tools already exist). The software is specifically designed to work well with large SD Files, such as the 25,000-structure sets that can be obtained from PubChem. Written in Java, ChemPhoto runs on Windows, Mac OS X, and Linux. Future articles will discuss ChemPhoto's design and implementation.

If you're interested in evaluating ChemPhoto, feel free to drop me a line.

Adobe Flash for Cheminformatics: Chemul, a 3D Structure Viewer 4

Posted by Rich Apodaca Thu, 14 Aug 2008 16:27:00 GMT


Previous articles have discussed the use of Adobe Flash for cheminformatics. Tetsuya Hoshi has created a 3D structure viewer (embedded above) called Chemul that can be used with the Flash Player and which is written in ActionScript 3.0. Although the documentation is written in Japanese, it appears that Chemul supports multiple display options, as evidenced here.

Chemistry, The Web, and Netflix 13

Posted by Rich Apodaca Wed, 11 Jun 2008 21:03:00 GMT

If you've ever rented movies from Netflix, you've probably noticed the information box that pops up when you hover over a movie image. If you just want a quick peek at what a movie is all about, this simple feature can save a great deal of time and effort in mousing around, clicking, and general navigation annoyance. It turns out that chemical compounds have a lot in common with movies in that they both can be referred to through one or more identifiers and they both have a lot of interesting metadata linked to them. This article shows that what works for Netflix can also work for chemistry.

The Problem

Interpreting IUPAC nomenclature and references to compound numbers is a major chore when working with chemistry experimental sections. When paper documents are used, this typically involves flipping pages back and forth many times between the narrative and the experimental section. With Web documents, this is usually either impossible or very inconvenient, and so the PDF is printed to paper.

A Demonstration

The following text is an edited and re-formatted passage taken from the experimental section of a paper published in Beilstein Journal of Organic Chemistry. If you hover over any hyperlink for half a second or more, a balloon will pop up showing you the chemical structure of the substance being referred to. Mousing away from the link hides the balloon.

1-[(1R)-1-(2- {[tert-Butyl(dimethyl)silyl]oxy}ethylhexyl] -2-piperidinone (34)

5-Bromopentanoyl chloride (1.84 g, 9.25 mmol) was added to a stirred solution of primary amine 32 (2.00 g, 7.71 mmol) in dry 1,2-dichloroethane (30 cm3), followed by anhydrous NaHCO3 (0.78 g, 9.25 mmol). The reaction mixture was left to stir at room temperature for 16 h. The resulting mixture was filtered through a pad of celite, which was then washed with CH2Cl2. The combined filtrate and washings were then evaporated in vacuo to yield a crude orange oil (4.06 g), which was purified by column chromatography on silica gel with hexane-EtOAc (7:3) as eluent to give the 5-bromo-N-[(1R)-1- (2-{[tert-butyl(dimethyl)silyl]oxy}ethyl)hexyl pentanamide 33 as an orange oil (2.92 g, 89%).

A portion of the bromoamide 33 (0.20 g, 0.47 mmol) was dissolved in dry THF (3 cm3) containing a suspension of potassium tert-butoxide (587 mg, 0.52 mmol), and the mixture was stirred at room temperature for 25 min before being diluted with EtOAc (10 cm3). The mixture was then washed with saturated aqueous sodium chloride solution (5 x 2 cm3). The combined organic extracts were dried (MgSO4), filtered and evaporated in vacuo to yield a crude yellow oil (0.16 g), which was purified by column chromatography on silica gel with hexane-EtOAc (85:15) as eluent to give 1-[(1R)-1-(2-{[tert- butyl(dimethyl)silyl]oxy}ethylhexyl]-2-piperidinone 34 as a pale yellow oil (0.13 g, 81%).

-Michael, Accone, Koning, and Westhuyzen, Beilstein J. Org. Chem. 2008, 4, 5

This demo has been tested on Internet Explorer 6/7, Firefox 2, and Safari 3.

Technologies

Although this demonstration is built on numerous Web technologies, two are at the top of the stack: the vector graphics rendering engine of ChemWriter and the open source Javascript library Balloon.js.

Chemical structures are displayed as lightweight Adobe Flash SWF files, as described in a previous Depth-First article. Software based on ChemWriter converts a molecular connection table into vector graphics commands for the Flash runtime with the help of the open source Transform SWF library.

Playing to the Web's Strengths

The Web is a new medium with a completely different set of rules compared to print media. One of its biggest strengths is interactivity: the ability to see something of interest and to immediately be able to find out more about it. One of its biggest weaknesses, even today, is technology standards. It's not enough to create interactivity; that interactivity must also fit within the technical constraints imposed by a medium that is still a work in progress.

As journal publishers and others grapple with how to approach the inevitable transition to purely Web-based scientific communication, it's important to keep both the strengths and limitations of the Web in mind. To date, nearly all attempts to create Web-based versions of chemistry journals have simply tried to duplicate the form of the print medium. This has resulted, if anything, on an even greater reliance on paper, resulting in valuable information being used well below its full potential.

Conclusions

This article has demonstrated a simple labor-saving technique in which chemical structures can be visualized by hovering the cursor over specially-designated chemical identifiers. There's quite a bit more that can be done with chemical vector graphics, chemical information, and Web technologies commonly used in consumer services like Netflix. Future articles will discuss some possibilities.

Adobe Flash for Cheminformatics: Fast, Scalable, and Attractive 2D Depiction of Chemical Structures with Vector Graphics 3

Posted by Rich Apodaca Tue, 10 Jun 2008 11:05:00 GMT

The previous article in this series discussed the use of vector graphics markup languages for cheminformatics, in particular for the display of 2D chemical structures. Although vector graphics are well-suited for creating responsive and appealing cheminformatics Web applications, the lack of universal native browser support makes both Scalable Vector Graphics (SVG) and its cousin Vector Markup Language (VML) unattractive at this time. This article highlights Adobe Flash as a 2D chemical structure renderer for Web applications, and features a fully-functional proof of concept based on the ChemWriter rendering engine.

About Adobe Flash

Although Adobe Flash is practically an industry unto itself today, at it's core, Flash is a lightweight vector graphics renderer. Introduced in 1996, the Flash Player can be found on millions of Internet-enable devices today. According to a study by Adobe, the Flash Player was running on nearly 99% of Internet-enabled desktops as of March 2008. The player has also found its way onto a host of handheld devices and phones.

Many technologies have been layered on top of the Flash Player. One of the first was the ActionScript scripting language. More recently, Adobe has introduced Flex, a full-fledged application development framework.

Unlike SVG and other vector graphics systems, Flash is ready today, proven, and about as close to universal as is possible on the Web. If you want to do vector graphics on the Web with the most convenient user and developer experience, Flash is your tool.

But what can Flash do for cheminformatics?

A Demonstration

The table below is composed of twelve cells, each of which display a chemical structure through the Flash Player.

zoom zoom zoom
zoom zoom zoom
zoom zoom zoom
zoom zoom zoom

Several points are worth mention:

  1. Each of the structures can be zoomed by clicking on its 'zoom' link.

  2. Each cell contains a lightweight embedded "SWF" file, or "ShockWave File," and the zoomed view displays exactly the same file. No matter how the SWF file is resized, it will always be proportionally-scaled to its smallest dimension and centered.

  3. The size of each SWF file ranges from a low of 563 bytes to a high of 8.5 KB, with an average of around 1.5KB. The larger the molecule, the more space is required. A comparable PNG with a resolution of 150x150 pixels would require on average for each structure about 6-8 KB.

  4. Each image was generated from a molfile using a development version of the ChemWriter rendering engine via the open source Transform SWF Java toolkit.

  5. SWF Files, unlike applets, are highly optimized for multiple instance display on all major platforms and browsers. In every case, startup will be nearly instantaneous and scrolling will be smooth. The performance of Flash should be at least as good as, if not better than, raster images.

The Right Tool for the Job (is Probably not a Raster Image)

One of the first challenges developers of cheminformatics Web applications are faced with is how to render 2D chemical structures. For an overview of the technologies now in use, see the previous article in this series. Each option has its own set of trade-offs.

The most widely-used 2D structure rendering option, raster images, is both inflexible and inefficient. Unlike a vector image, a raster image by definition has only one resolution, which is fixed at creation time. If image dimensions need to change, then all structures must be re-imaged. Given the size of many of today's chemistry databases, such a system-wide re-imaging of structures can involve a non-trivial amount of processor power and bandwidth.

To compensate, many sites store relatively large images, say 300x300 pixel, and then use the HTML <img> tag to shrink it as needed. But this creates problems of its own: both storage and bandwidth requirements are far larger than they need to be, resulting in the need for more powerful server hardware and poorer application scalability. And then there are the application's users, who must wait through a 30KB or higher download for each 2D image.

A significant number of structures in any compound collection will be so large that even a 300x300 pixel image will be insufficient to render the necessary detail. For example, a recent Depth-First article discussed a vector graphics solution this problem within the context of Chempedia, the free chemical encyclopedia. Vector graphics simply eliminate this issue.

Many cheminformatics applications would benefit from being able to show 50 or more structures at a time, with each structure having a zoom view for closer inspection. To a non-chemist, this might seem unnecessary. But for today's chemists dealing with large chemical catalogs and high-throughput screens, it's not only possible, but a routine part of the practice of chemistry. The raster image approach makes it extremely difficult to meet this important need on the Web. Vector graphics, possibly delivered through the Flash Player, offer a much simpler and more efficient way to do it.

2D chemical structures are vectorial in nature; using raster images to depict them is in most cases the more costly and lower quality option.

Summary

Vector graphics are a near-perfect match for the job of depicting 2D chemical structures on the Web. Although there are many vector graphics platforms to choose from, the Flash Player is by far the most universal option. This article has demonstrated a working example of multiple 2D chemical structures rendered as lightweight vector images via the Adobe Flash Player, the first and only such demonstration of which I'm aware.

The key technologies behind this demonstration are the ChemWriter rendering engine and the open source Flash developer toolkits available from Flagstone Software. If you're interested in learning more about how vector graphics and Flash can improve both the user and developer experience in your cheminformatics Web applications, I'd be happy to hear from you.