Casual Saturdays: Cynical Dreamer

Posted by Rich Apodaca Sat, 17 Nov 2007 16:45:00 GMT

Why Web Development is Hard 2

Posted by Rich Apodaca Fri, 16 Nov 2007 14:00:00 GMT

The very thing you'd like most to do as a developer is the thing your users can't stand.

PerlMol: A Case Study in Open Source Cheminformatics Software 4

Posted by Rich Apodaca Thu, 15 Nov 2007 14:49:00 GMT

How does open source software happen? Although many factors come into play, the majority of answers seem to revolve around a simple theme: developers building solutions to fill their own needs. Yet only a fraction of these solutions end up becoming open source software. And only a fraction of those end up being used by a wider audience. What's the key ingredient? There's still a lot to learn from studying individual cases.

Readable discussions about the origins of specific open source projects are pretty rare, but those dealing with the origins of open source cheminformatics software are more uncommon still. So it was with great interest that I came across Ivan Tubert-Brohman's account of how PerlMol was created.

PerlMol is an open source "collection of Perl modules for cheminformatics and computational chemistry." Many software packages fit into this category, and some of them are open source, so why write another? For Tubert-Brohman, the deciding factor was being able to work in his preferred environment, Perl:

I was surprised that CPAN [The Comprehensive Perl Archive Network] was sorely lacking in terms of modules for chemistry. The only available modules were Chemistry::Element, which allows you to convert between atomic number, element symbol, and element name and store other elemental information; and Chemistry::MolecularMass, which calculates the mass from the molecular formula. There were no modules that actually dealt with the structure of molecules. While some of the options in other languages are not bad, I was looking for something with the simplicity and conciseness of Perl that could allow me to write "chemical one-liners" to solve small problems very quickly, without having to compile anything. Hence, PerlMol was born.

The elimination of the need to compile, and relaxed syntaxes that promote succinct code are two of the biggest reasons to try a cheminformatics scripting environment.

There's a lot of great software still to be written in cheminformatics, and some of it will be open source. Although open sourcing that side project you've been working on may not be the best option for your career or your company, studying case studies like that of PerlMol gives plenty of food for thought.

Making the Case: OpenSMILES 3

Posted by Rich Apodaca Wed, 14 Nov 2007 14:42:00 GMT

SMILES is one of the most widely-used line notations in cheminformatics. Yet until very recently, there has been no concerted attempt to develop open SMILES encoding standards.

OpenSMILES aims to change that. By providing a forum in which concerns from the SMILES user community can be voiced, peer-reviewed, and addressed, OpenSMILES introduces a new way for the SMILES language to become better.

A draft OpenSMILES specification is now available for review. For now, the best way to raise issues and otherwise get involved is through the OpenSMILES mailing list.

Create Your Own PubChem Datasets: Exporting Results As SD Files

Posted by Rich Apodaca Tue, 13 Nov 2007 21:43:00 GMT

Recently, I needed to create a subset of the PubChem database in Structure Data File (SD File) format. Although it's far from obvious how to do this, the capability does exist. In this article, I'll give a step-by-step procedure for creating custom datasets in SD File format from arbitrary PubChem structure queries.

Create and Execute the Query

Let's say we want to create a dataset in SD File format containing all N-Boc-protected piperidines registered in PubChem.

From the main PubChem site, choose the Structure Search link. Then click the "Sketch" button.

Next, draw your molecule in the 2D structure editor:

Then click the "Done" button.

Before starting the query (by clicking the "Search" button), be sure to select the "Substructure" option under "Search Type."

Exporting the Results

You should now be looking at a screen containing the first few hits of a 7700+ hitset. But how do we export these results in SD Format?

Next to a field labeled "Display", you'll see a drop-down box containing several different options. Choose the one labeled "PubChem Download."

You'll be redirected to a download page from which you can select output formats, including SDF, or SD File. You can also select a compression type (datasets of even 2000 records can be quite large uncompressed). For this example, we'll select SDF format with GZip compression.

Clicking on the "Download" button takes us to a status page that eventually informs us when our download has been processed. You should then get a "Save File" dialog or something similar. If not, you should see a link to the compressed SD file.

Downloading the results file completes the process.

Older posts: 1 2 3 4 5