Never Draw the Same Molecule Twice: Viewing Image Metadata 5

Posted by Rich Apodaca Wed, 08 Aug 2007 07:40:00 GMT

Chemists are accustomed to embedding live molecular objects in their documents with Microsoft Word/ChemDraw. These objects can then be reprocessed and embedded into other documents, such as PowerPoint presentations, saving enormous amounts of time. What if the same feature were available with Web documents?

A recent D-F article proposed a method to encode molecular structure data within commonly-used Web image formats such as PNG. That article contained an embedded image of GlaxoSmithKline's diabetes treatment rosiglitazone (Avandia) encoded by a rendering toolkit built with Firefly. I claimed that this image contained the complete connection table and atom coordinates as embedded metadata. In this article, I'll show a simple method to read this metadata.

Metadata is a standard part of the PNG specification; to read it requires nothing more than software capable of recognizing it. I recently found a Web-based, cross-platform method for doing so. The Image Metadata Viewer by FileFormat.info accepts an uploaded image file and returns that image's metadata. Let's try it with the image of rosiglitazone.

After saving the image to my hard drive, uploading it to FileFormat.info and pressing start, I can see that the image contains metadata:

The metadata can be viewed either as XML or as plain text. Choosing plain text (second option) gives me the complete molfile, stored as a key/value hash (molfile=[molfile]).

Clearly, reading metadata is not a problem given the right software. But this leaves the question of how metadata is encoded in the first place - especially in a programming language such as Java. Like everything else, it's not difficult when you know how. Stay tuned for the answer.

Comments

Leave a response

  1. baoilleach Fri, 10 Aug 2007 15:20:40 GMT

    I've posted a follow up of sorts at: http://baoilleach.blogspot.com/2007/08/access-embedded-molecular-information.html

    (Any chance of enabling trackback?)

  2. Rich Apodaca Fri, 10 Aug 2007 17:34:10 GMT

    Noel,

    Very nice. I wonder how you write image metadata in Python/PIL...

    I'd like to enable trackback, but so far I haven't been able to get it to work right with the spam protection on my site.

  3. baoilleach Sat, 11 Aug 2007 09:04:06 GMT

    Regarding writing image metadata, PIL doesn't seem to allow it. This makes me suspect that perhaps storing this information in the 'info' field is an abuse of the PNG spec. I'm too lazy to check it up, but have you considered using EXIF instead (I think Joerg has suggested this too). There are sure to be well-established libraries in every language for reading/writing this data.

  4. Rich Apodaca Sun, 12 Aug 2007 09:21:42 GMT

    Exif is an interesting approach of JPG images. Reading the overview, it's clear this is an evolving area of standardization with some challenges ahead.

    About the 'info' field - if the spec says any text metadata are valid, then IMO any text metadata are valid. I'm not sure I see what you mean by "abuse."

  5. baoilleach Mon, 13 Aug 2007 07:22:46 GMT

    Well, Exif's been around for years though, hasn't it.

    I take back the "abuse". :-)

    I think I was just unwilling to admit that PIL just doesn't support this feature, which is a bit disappointing really. It seems that if you read in a PNG file and write it back out, the 'info' field is wiped.