Crazy Idea #6,349: JavaScript in PDF 1

Posted by Rich Apodaca Thu, 18 Sep 2008 18:04:00 GMT

In reading through Bruno Lowagie's excellent book iText in Action, I noticed several references to JavaScript. I knew it was possible to do a lot of interesting things with PDF to promote interactivity. But JavaScript?

It turns out Adobe has even published about 1,000 pages worth of documentation on the subject in the form of the Acrobat JavaScript Scripting Guide and the Acrobat JavaScript Scripting Reference.

iText, the premier Java library for creating and manipulating PDF documents, supports JavaScript out of the box. The extent of this support is not clear to me yet, but the API does exist.

JavaScript has a lot of untapped potential for enhancing interactivity in cheminformatics. Similarly, PDF has a lot of untapped potential, which will be the topic of future articles.

What kinds of things could you do in cheminformatics by combining PDF and JavaScript?

Chrome and a V8: JavaScript Takes a Giant Leap Forward? 2

Posted by Rich Apodaca Fri, 05 Sep 2008 01:42:00 GMT

Back when I started writing Java software in 1997, the Java Virtual Machine was slow. It was so slow that for years, many developers abandoned all hope of using the language for "serious" work once it became clear how much slower it was than C and C++. Eleven years of Moore's Law compounding, and countless JVM optimizations later, Java is so fast today that relative speed is rarely even considered when developing client and server applications.

Today, JavaScript occupies a similar position to that held by Java in 1997: a ubiquitous language with a basically good design that has significant performance issues.

The Next Big Thing? JavaScript Virtual Machines

This situation may be about to change - radically. Several groups are going to great effort to improve the performance of JavaScript by creating JavaScript Virtual Machines. The most recent entry into this increasingly crowded field is Google Chrome. Among Chrome's many innovations is the introduction of one of the first JavaScript Virtual Machines (V8) into a production browser. A virtual machine works fundamentally differently from traditional JavaScript interpreters, with the potential for greatly reduced memory requirements and speed increases.

Put a V8 in Your Browser

How fast is Chrome's V8 engine? We can get an idea by running some benchmarks.

The chart below shows the results of running Google's V8 Benchmark Suite (bigger bars mean faster execution):

As you can see, Chrome leaves both Firefox 3 and IE 7 in the dust, at least according to this benchmark. Another popular benchmark is SunSpider, where the results are qualitiatively similar; Chrome's execution time surpasses that of IE 7 by over two orders of magnitude.

My system consisted of an Ubuntu Linux machine running a clean install of Windows XP on Sun's excellent virtualization product, VirtualBox. Your mileage may vary. Note: it's important to disable Internet Explorer's warning prompt that reads "This page contains a script which is taking an unusually long time to finish. To end this script now, click Cancel." (the presence of which is telling in itself).This can be done by following the instructions here.

Conclusions

JavaScript is in the middle of a textbook Marketplace Disruption. Just four years ago, few even thought about the language. Today it's the centerpiece of Web interactivity. Perhaps the biggest issue remaining, performance, is now the focus of intense research that is beginning to bear fruit. Many of the key technologies now starting to appear, such as V8, are modular and open source; other browser vendors can adapt them for use in their own products. It's an offer few can afford to refuse.

Sooner than many might have thought possible, JavaScript may stop being viewed as the slow language. Then what?

JavaScript for Cheminformatics: Atom Typing with Prototype and Iterators

Posted by Rich Apodaca Thu, 28 Aug 2008 19:08:00 GMT

The previous article in this series discussed the use of Prototype to build a simple cheminformatics model. But there's much more to Prototype than an improved class-like syntax. This article discusses one specific way that Prototype enhances JavaScript collections through iterators.

The Problem

Let's say we have an instance of Molecule, as defined in the previous article, and we'd like to group the carbon atoms separately from the heteroatoms. In many languages, including Java, we'd have to write a for-loop complete with logic for comparing atoms and then placing them into bins. Prototype makes possible a more modular approach with iterators.

Functional Programming and Iterators

JavaScript is a multi-paradigm programming language that offers tools for both object-oriented and functional programming approaches. In practical terms, this simply means that even functions in JavaScript behave as objects: they can be created dynamically and passed as parameters. Prototype takes advantage of this to extend collections instances such as Arrays with built-in iterators that are analogous to iterators found in languages such as Ruby.

A Simple Test

The JavaScript below builds a pyridine molecule:

function createPyridine(){
  var pyridine = new Molecule();
  var c1 = pyridine.addAtom("C");
  var c2 = pyridine.addAtom("C");
  var c3 = pyridine.addAtom("C");
  var c4 = pyridine.addAtom("C");
  var c5 = pyridine.addAtom("C");
  var n6 = pyridine.addAtom("N");

  pyridine.connect(c1, c2, 1);
  pyridine.connect(c2, c3, 2);
  pyridine.connect(c3, c4, 1);
  pyridine.connect(c4, c5, 2);
  pyridine.connect(c5, n6, 1);
  pyridine.connect(n6, c1, 2);

  return pyridine;
}

Saving this code in a file called molecules.js, we can use Firefox with Firebug to test it with the following HTML:

<html>
  <head>
    <script type="text/javascript" src="prototype.js"></script>
    <script type="text/javascript" src="chem.js"></script>
    <script type="text/javascript" src="molecules.js"></script>
  </head>
  <body></body>
</html>
With the Firebug console, we create a Molecule of pyridine:
>>> m = createPyridine();

To separate carbons from heteroatoms, we use the Prototype partition iterator:

>>> m=createPyridine();
Object atoms=[6] bonds=[6]
>>> m.atoms.partition(function(atom){return atom.label == "C"});
[[Object label=C neighbors=[2] bonds=[2], Object label=C neighbors=[2] bonds=[2], Object label=C neighbors=[2] bonds=[2], 2 more...], [Object label=N neighbors=[2] bonds=[2]]]

The partition iterator accepts a function as a parameter and returns an array containing two sub-arrays: the first contains the elements for which the function evaluated to true (carbons) and the second contains the elements for which the function evaluated to false (heteroatoms).

Conclusions

Although the example shown here is rather simple, it's possible to extend the general principle to more complex atom typing systems. By creating a single function that evaluated atom type, we could pass it as a parameter to any number of collection iterator functions.

JavaScript for Cheminformatics: Using the Prototype Framework 1

Posted by Rich Apodaca Tue, 26 Aug 2008 14:01:00 GMT

If you want to do the kind of cheminformatics that involves manipulating atoms, bonds, and molecules, object-oriented programming works well as a development paradigm. Although perhaps not readily apparent, JavaScript offers everything needed to create object-oriented models just as intricate at those written in languages like C++ and Java. This article discusses one approach that makes use of the Prototype Framework.

About Prototype

Prototype is a set of extensions to the JavaScript language that make developing in it less painful. Some of the extensions relate to DOM manipulation. Other have to do with the way Strings and Arrays behave. For the purposes of this article, we'll be using Prototype's syntax support for classes and inheritance.

Atoms, Bonds, and Molecules

To start, we'll need classes that define the basic behavior of atoms, bonds, and molecules. Although we may ultimately need to consider issues such as multicentered bonding, for now, we'll stick to a simplified view of chemistry that has bonds connecting two and only two atoms.

Creating the Classes

We could use JavaScript's built-in method for creating objects from class-like structures, but Prototype's approach is cleaner.

In the following library, we define a Molecule as a collection of Atoms and Bonds with useful relationships:

var Molecule = Class.create({
  initialize: function(){
    this.atoms = [];
    this.bonds = [];
  },

  addAtom: function(label){
    var atom = new Atom(label);

    this.atoms.push(atom);

    return atom;
  },

  connect: function(sourceAtom, targetAtom, bondType){
    var bond = new Bond(sourceAtom, targetAtom, bondType);

    sourceAtom.neighbors.push(targetAtom);
    sourceAtom.bonds.push(bond);
    targetAtom.neighbors.push(sourceAtom);
    targetAtom.bonds.push(bond);

    this.bonds.push(bond);

    return bond;
  }
});

var Atom = Class.create({
  initialize: function(label){
    this.label = label;
    this.neighbors = [];
    this.bonds = [];
  }
});

var Bond= Class.create({
  initialize: function(source, target, type){
    this.source = source;
    this.target = target;
    this.type = type;
  },

  getMate: function(atom){
    if (atom == this.source) return this.target;
    if (atom == this.target) return this.source;

    return null;
  },

  contains: function(atom){
    return (atom == this.source || atom == this.target);
  }
});

Testing the Library

We can test the library interactively by saving it in a file called chem.js and creating some simple HTML:

<html>
  <head>
    <script type="text/javascript" src="prototype.js"></script>
    <script type="text/javascript" src="chem.js"></script>
  </head>
  <body></body>
</html>

We can then use the Firebug console to test the library interactively:

>>> m = new Molecule();
Object atoms=[0] bonds=[0]
>>> c = m.addAtom("C");
Object label=C neighbors=[0] bonds=[0]
>>>n = m.addAtom("N");
Object label=N neighbors=[0] bonds=[0]
>>> m.connect(c, n, 3);
>>> c.neighbors.size()
1

Conclusions

Although the cheminformatics library discussed here is far from being useful, it's not difficult to see how to extend it. Prototype offers a several possibilities for doing so.

Building WebSpex: Putting Custom Data Types In Their Place

Posted by Rich Apodaca Thu, 24 Jul 2008 16:40:00 GMT

The previous article in this series introduced WebSpex, a spectroscopic data visualization tool being designed especially for use in a Web browser. Previously, the platform on which the user interface would be built was discussed. This article will discuss the question of where to put the spectroscopy data that WebSpex will display.

Tag Soup

We've decided to target WebSpex for use on the Web, which means that spectroscopy data would need to be referenced or embedded in a Web page. How should we do this? The answer, it turns out, is far from obvious.

If we knew that WebSpex were going to be created as a Java or Flash applet, which is not the current plan, we might be tempted to pass a reference to the data (or the data itself) as a parameter in the <object> tag. For an applet, this might look something like:

<object type="application/x-java-applet;version=1.4.2" width="520" height="350">
  <param name="code" value="com/metamolecular/webspex/applet/FullApplet.class">
  <param name="archive" value="http://metamolecular.com/applets/webspex.jar">
  <param name="jcamp" value="http://base-url/spectrum.jdx">
</object>

In the example above, the parameter jcamp would encode the path to a JCAMP-DX file for WebSpex to load.

Alternatively, if we were going to develop WebSpex as a Flash applet, we might use an object tag like this:

<object type="application/x-shockwave-flash" width="520" height="350">
  <param name="movie" value="webspex.swf">
  <param name="FlashVars" value="filename=http://spectrum.jdx">
</object> 

In this example, we associate the parameter filename with the value "spectrum.dx" using FlashVars.

This works well enough, but what if we need to load a custom data type in a Web page without a plugin?

Some Options

There are a few options for including custom data in an HTML document:

  • Invent our Own Tag Browsers are designed to ignore content they don't understand. We could just hack our own tag, let's call it <spectrum>. But for a variety of reasons, this is a bad idea. Most importantly, we'd be breaking with conventions used worldwide, which is never a good idea without a very good reason. For another, any developer tools we'd use would probably complain about a mis-formed HTML document. Still another reason might be that browsers could parse our invented tag in unpredictable ways. We may also run into problems with search engines not indexing our content properly.

  • Use XHML We could try inventing a tag the right way: with XHTML. This might be a worthwhile option if our data type (JCAMP-DX) were XML-based, but it's not. At best we'd expend a lot of effort learning about namespaces, schema, and HTTP response headers only to end up with an amorphous flat <spectrum> tag containing freeform text.

  • Use JSON We could encode our JCAMP-DX files as JSON. JSON is a markup language like XML, but with the difference that it can be evaled directly by the JavaScript interpreter. This has the advantage that either a filename, or the actual data could be encoded. We could, in fact, create the entire object model for our spectrum, ready to be displayed, if we had software that could make the conversion from JCAMP-DX to JSON. This approach has the disadvantage that it could require significant amounts JavaScript code to be mixed in with our HTML, a less than ideal solution.

  • Use the Object Tag Given that none of the three options above are especially appealing, we might ask ourselves whether we've really tried everything possible to use plain old HTML to encode our data. More specifically, what if we were to use the object tag itself, without actually having a plugin?

Encoding Custom Data Types With The Object Tag

The HTML 4 specification has this to say about the object tag:

Most user agents have built-in mechanisms for rendering common data types such as text, GIF images, colors, fonts, and a handful of graphic elements. To render data types they don't support natively, user agents generally run external applications. The OBJECT element allows authors to control whether data should be rendered externally or by some program, specified by the author, that renders the data within the user agent.

In the most general case, an author may need to specify three types of information:

  • The implementation of the included object. For instance, if the included object is a clock applet, the author must indicate the location of the applet's executable code.
  • The data to be rendered. For instance, if the included object is a program that renders font data, the author must indicate the location of that data.
  • Additional values required by the object at run-time. For example, some applets may require initial values for parameters.

The OBJECT element allows authors to specify all three types of data, but authors may not have to specify all three at once. For example, some objects may not require data (e.g., a self-contained applet that performs a small animation). Others may not require run-time initialization. Still others may not require additional implementation information, i.e., the user agent itself may already know how to render that type of data (e.g., GIF images).

In other words, we could place a reference to a spectrum object in an HTML page with code like this:

<object width="520" height="350">
  <param name="data" value="http://base-url/spectrum.jdx">
</object>

After loading the document, we could have WebSpex walk the DOM looking for object tags that could be replaced with an instance of WebSpex. That instance could actually be placed inside the original object tag like this:

<object width="520" height="350">
  <param name="data" value="http://base-url/spectrum.jdx">
  <div class="webspex">
    <!-- WebSpex visual presentation -->
  </div>
</object>

The HTML 4 documentation states that any content contained within object tags not recognized by the user agent will be rendered (fallback content). So dynamically inserting the div into the object tag as shown above would have the effect of giving the browser something to display in place of the object tag.

Advantages of Using This Approach

This approach has several advantages worth mentioning:

  • It's fully compliant with the HTML 4 specification.

  • It provides a natural anchor point to attach both the custom data and the visual presentation of that data.

  • It's pure HTML, requiring minimal mixing in of JavaScript content.

  • Web spiders can be taught a single method to associate a spectrum with a URL, regardless of how the viewer is implemented.

  • It's technology-agnostic. This approach lets us implement WebSpex as a Java or Flash applet (or some other plugin technology) just as easily as a pure JavaScript UI. To change our viewer implementation, we just change a JavaScript file.

  • It allows spectra to be inlined, or place directly into the HTML. Using a Data URI, we could replace "http://base-url/spectrum.jdx" with something like "data:chemical/x-jcamp-dx;base64,iVBORw0KGgoAA...". This would be important in those situations in which a public URL to a JCAMP file was not feasible and/or desirable. It could also accelerate the rendering of multiple spectra in the same page by eliminating the need to create a separate HTTP request for each file.

The method carries an imporant limitation: if a user has disabled JavaScript, they may see nothing to indicate a problem. We could address this issue by always placing fallback content in the object tag that would then be overwritten by the JavaScript code.

Implementation Detail

This approach relies on Onobtrusive JavaScript techniques to keep JavaScript as separate from HTML as possible. One way to implement such a scheme would be to include a single reference to the relevant JavaScript somewhere in the document, probably withing the <head> tag or right after the opening <body> tag:

  <script type="text/javascript" src="webspex.js"></script>

The file webspex.js would then execute code to place a function into the document's onLoad queue that would scan for object tags containing JCAMP-DX content and insert the needed viewer.

Previous Uses

I'm unaware of any previous applications of this technique, although is seems like something that may have been used before.

Conclusions

Encoding and displaying custom data types in HTML is possible by using the HTML 4 object tag coupled with client-side JavaScript to rewrite the DOM. It offers the potential to create HTML documents that are both human- and machine-readable. Although the approach described here was developed for the special case of spectroscopy data, it could in principle be used for any data type requiring a visual presentation.

Image Credit: mdezemery

Older posts: 1 2