The Daily Molecule: The Wonders of Chemistry - One Molecule at a Time 4

Posted by Rich Apodaca Wed, 14 May 2008 15:58:00 GMT

Chemistry is a big field judged by any standard, including the proliferation of American Chemical Society (ACS) divisions. Each subdiscipline in chemistry is in turn so big, that once a chemist becomes 'differentiated' it's easy to lose touch even with neighboring subdisciplines. It doesn't have to be that way. This article introduces a new service, The Daily Molecule designed to make it just a little bit easier (and hopefully fun) to stay in the chemical loop.

What Is It?

The idea is simple: every weekday, a new molecule will be featured on The Daily Molecule with a short write-up and some leading references. Although molecules in the news will get first priority, any molecule is fair game.

The material for The Daily Molecule will be drawn from Chempedia, which in turn gets some of its content from Wikipedia. In other words, the entries on the Daily Molecule will be largeley written by my fellow chemists.

The process of creating a Daily Molecule entry is not time-consuming, but much of what is being done manually now could be automated in the future. The technology platform lends itself well to many forms of chemistry-specific modification (see below).

I hesitate to use the term 'blog' to describe The Daily Molecule, but the description may be helpful to an extent.

The Daily Molecule is unlike a blog in that most content will be generated by others, selected by some criteria, reformatted for consistency, and published. In that sense, The Daily Molecule is a something like a mini scientific journal, but it turns the process of acquiring content on its head.

If chemistry ever evolves beyond the current model of publication, which seems inevitable at this point, the journals of the future may resemble The Daily Molecule in one or more ways.

Technology

The software running The Daily Molecule is a modified version of SimpleLog, a Web application based on Ruby on Rails. Unlike most blogging engines, SimpleLog focuses on implementing only the most basic publication features, and doing them to perfection. If you know a little Ruby and can work with Rails, you can do a lot with SimpleLog.

One of the first items of business will be to implement reCAPTCHA support and activate comments on articles.

Some ideas for chemically-enabling The Daily Molecule include a graphical abstract sidebar and (sub)structure search. Currently, the 2D chemical structure images posted to The Daily Molecule have complete connection tables embedded as metadata, a feature with some interesting possibilities.

The Molecule of the Day/Week/Month

The basic idea behind The Daily Molecule is not new. Many other services have sprung up over the last ten years that operate, at least on the surface, similarly. Some examples:

Quite a few others don't appear on this list.

The different idea behind the The Daily Molecule is that chemical content already exists in on the Web in machine-readable format with licenses that permit its re-use; all that's needed is a way to aggregate, format, and package that information in a form suitable for once-daily scanning and cheminformatics manipulation.

Conclusions

Like no other medium, the Web blurs artificial distinctions: between work and play; between private and public; between on-topic and off-topic; between fame and obscurity; between mine and yours; between big and small; and between profit and non-profit. Chemistry may be late to the party, but is not immune to its call.

Building Chempedia: Start Simple, Then Iterate 3

Posted by Rich Apodaca Tue, 13 May 2008 15:38:00 GMT

As a medium for building software, the Web offers unparalleled adaptability. With nothing to download or install, users of Web applications automatically see the newest version - always. This may sound like a small thing, and technically it is. But it dramatically increases the effectiveness with which software can be created. The previous article in this series introduced Chempedia, the free Chemical encyclopedia and cheminformatics Web application. This article will discuss the process by which Chempedia will become a better service over time.

Iterative Web Application Development

Chempedia, like all actively-developed software, is a work in progress. It will be built in stages starting with the addition of new features, followed by a round of user feedback, bug fixing, and stabilization. This will then be followed by the next major iteration, and so on.

This iterative design style is ideally suited for Web applications. Because the barrier to pushing out new versions is essentially non-existent, a Web application can evolve at a much more rapid rate than other kinds of software. Indeed, the first version of a Web application need only work well enough to prove a point.

One of the keys to iterative Web development is a technology framework designed to facilitate it. Chempedia is being developed with Ruby on Rails, a tool that enables Web developers to take full advantage of the iterative development style the Web makes possible.

Another key element of iterative Web development is users willing to explore the system and offer criticism. Evolution succeeds only when the environment stresses an ecosystem; the same is true in Web application development.

Chempedia will take full advantage of the evolutionary nature of Web application development. As features are added and (hopefully) use of the service grows, Chempedia will evolve in ways that are impossible to predict today.

What's Wrong With Chempedia?

If you happened to take a look at Chempedia last week (that version is now no longer visible), you probably noticed many, many things that needed improvement. Some concerns were in the areas of:

  • Navigation. Navigation works best when the right granularity of options is achieved. Chempedia's navigation system grouped both closely-related and dissimilar actions at the same level.

  • Metaphor. The initial idea behind Chempedia was to see what happened when PubChem's chemical structures were mashed up with Wikiepia articles, using CAS numbers as the common link. The site design reflected this, with no clear organizing principle other than mashup. However, after the initial demonstration of the success of this approach, it became clear that Chempedia was strikingly similar in both form and function to the Merck Index. Perhaps this should be used as a clue in deriving a better organizing principle.

  • Wikipedia integration. The old Chempedia site didn't make it nearly as convenient as is should be to create or edit compound monographs. Because Chempedia serves as a chemically-aware front-end for Wikipedia, the easier it is to get to Wikipedia from Chempedia, the better.

What Changed?

During the process of trying to fix Chempedia's problems, it became clear that a major redesign was in order. This consisted of:

  • Creating a landing page oriented toward search. Using the Merck Index as a metaphor suggested that Chempedia's landing page should be designed around search, not browsing - as it was originally designed.

  • Emphasizing compound monographs, not compounds. Chempedia's central organizing principle is now the Compound Monograph. One way this is seen is in the new URL structure, which makes it very easy to see where a Chempedia link is about to take you. For example, consider the URL for benzene. Another way this can be seen is in the inclusion of Compound Monographs lacking a chemical structure.

  • Designing a streamlined menu system. The main menu system has been broken down into just three main categories: Search; Browse; and Create. These headings refer to actions on Compound Monographs, again in line with their importance as an organizing principle.

  • Promoting better integration with Wikipedia. After experimenting with a few implementation possibilities, it is now possible to edit Wikipedia articles directly from the Chempedia site, thanks to the use of inline frame. Once again, this capability is tied to the Compound Monograph, from which editing and updating links are accessible.

  • Striving for comprehensive Wikipedia coverage. Wikipedia had far more compound monographs than could be found on Chempedia, 6,411 of them, to be precise. Chempedia now contains all of them, regardless of whether a chemical structure can be found based on a CAS number in PubChem. This includes inorganics, organometallics, polymers, mixtures, and polypeptides.

Miles to Go Yet

Chempedia is far from being finished. For example, you'll notice many instances in which a Compound Monograph is truncated. This arises from difficulties in parsing Wikipedia's Wikitext format (more on this later).

Ultimately, the full text of each Wikipedia article will be present on Chempedia rather than just the first introductory paragraph. But it will take a significant amount of work to ensure that each article's Wikitext entry can be parsed faithfully.

Chempedia allows search by CAS number, PubChem CID and exact title. Full-text searching is not yet implemented, nor is autocomplete search, both of which would greatly enhance the usability of the service.

Exact structure searching is made possible by the ChemWriter editor in combination with SHA-1 hashed InChIs. Substructure search and query atom search will ultimately be added, but for an encyclopedia containing relatively few molecules, most of which having trivial names, this isn't yet seen as being critical.

You'll notice many Monographs on Chempedia that have no structure information. Behind the scenes, Chempedia uses the 350,000+ CAS numbers now contained in the PubChem database to associate a chemical structure with a Wikipedia article. In the future, these associations will be made by Chempedia and Wikipedia users, which will allow every Chempedia small-molecule Monograph to have a structure associated with it. (It will also create a rather large, publicly-curated, open database of CAS numbers linked to chemical structures, but that's a story for another time).

Your Feedback is Essential

Finally, many of the changes made in this iteration were the result of conversions with chemists and developers. If you see something on Chempedia that just doesn't work for you, please don't be shy about saying so. Feedback is an essential ingredient in making Chempedia the best service it can be.

Chempedia.net: Mashing Up PubChem and Wikipedia 12

Posted by Rich Apodaca Fri, 04 Apr 2008 14:06:00 GMT

PubChem and Wikipedia represent two of the largest open repositories of chemical information in the world. And they complement each other very nicely. PubChem contains mainly low-level chemical structure information whereas Wikipedia contains free-text descriptions of chemical compounds in the form of compound monographs.

Both services offer permission and access to copy and reuse their contents. But neither service is, by itself, nearly as useful as it could be.

Why not mash them up?

To explore that question my company, Metamolecular, LLC has launched Chempedia.

To my knowledge, Chempedia represents the first publicly-facing database of compounds to incorporate Wikipedia's collection of organic compound monographs. And it's one of the few cheminformatics services to make use of free-text descriptions generated by individual chemists.

Chempedia has been somewhat selective about the compounds it includes. To date, it has spidered over 2,500 monographs, combining them with over 300,000 of the most interesting compounds from PubChem. Not every Chempedia.net molecule has a monograph, but now there's a tool that can actually make that absence apparent.

Chempedia is both an experiment and a service. It's immediately useful for anyone in the business of making or doing things with organic molecules. It's created several unexpected moments of "Oh, that's actually a useful molecule!" It also will serve as a platform to test some of the ideas discussed in Depth-First over the last year or so on the advantages of the Web for collaboration in chemistry.

Stay tuned for more details about how Chempedia was created and some of its applications in chemistry.

NetBeans 6, Ruby, and Rails: A Surprisingly Effective Combination

Posted by Rich Apodaca Thu, 27 Mar 2008 17:46:00 GMT

For far too long Ruby has lacked a development environment that supported important features developers in other languages now take for granted: code completion; refactoring; platform-independence; and speed. Although NetBeans may not spring to mind when thinking of Rails IDEs, it should be at the top of the list for anyone interested in the subject.

Getting started with Ruby, Rails and NetBeans is as easy as downloading the installer and running it. If you later decide to add Java support to your installation (which is also excellent), that can be done by downloading and running the Java installer. You'll end up with a single IDE that supports both languages.

Code Completion

Although other IDEs support some form of Ruby code completion, NetBeans takes it to another level. Can't remember the exact name of the method you're looking for? Type the period and let NetBeans look up both the name and documentation for you:

Hitting return enters the method and creates a template for parameters and any needed blocks.

Refactoring

One of the things that makes Java such a powerful language for large projects is the refactoring support offered by most IDEs. NetBeans brings this power to Ruby. Need to rename a class, method, or variable? Let NetBeans do it for you:

Conclusions

There's much more to NetBeans 6 and Ruby/Rails than what's been shown here, including formatting/highlighting for JavaScript and CSS, user-definable Ruby/JRuby interpreter, and menu-based script execution. Whether you're looking for a way to get started with using Ruby and Rails or a way to become more efficient at it, NetBeans 6 is well worth the time.

Paginated Archives in Radiant CMS: The Power of Minimal But Extendable Systems 3

Posted by Rich Apodaca Wed, 07 Nov 2007 14:40:00 GMT

If you've ever needed to build a Website hosting mostly static content, you've probably tried out a few Content Management Systems. The problem is not finding them - there must be hundreds. The problem is finding one that successfully walks the fine line between being minimal (so that you can do things your way) and powerful (so that it can grow with your needs).

Radiant CMS is one of those systems. As an added bonus, it's written in Ruby and built on Rails. Radiant succeeds by focusing on the management of pages while providing a powerful extension mechanism.

The Website for my company, Metamolecular, will consist of content produced infrequently (product descriptions and documentation) intermingled with more frequently created blog-like content (updates, tutorials, responses to user questions). Traditionally, the CMS has handled the former, with blogging software handling the latter. But we needed a system that handled both well.

One of the distinguishing characteristics of blogs, as opposed to other kinds of websites, is the unusually large number of similar pages. Handling this kind of content requires pagination - the ability to break an archive up into a series of pages containing a smaller subset of the archive.

Although Radiant doesn't have the ability to paginate its content, it does have a wonderful system for creating extensions. I thought I'd give it a try.

The result is the Paginated Archive extension. It works as a drop-in replacement for Radiant's existing Archive Page. After placing the extension into your PROJECT_HOME/vendor/extensions directory, you'll be able to create and configure Paginated Archives for use with blogs and other kinds of sites generating large numbers of pages. The extension requires Bruce Williams' excellent Paginator gem.

You can get started by downloading the extension here.

Older posts: 1 2 3 4