The Power of Simple

Posted by Rich Apodaca Fri, 23 Feb 2007 14:40:00 GMT

I think I was also surprised by the success of something so simple. That's a mantra for many people in the technology world - simplicity. But what we built wasn't that amazing. It was the idea of putting a couple of things together and being able to establish a lead by doing something really, really simple. How far you can get on a simple idea is amazing. I have a tendency to add more and more - the ideas always get too big to implement before they even get off the ground. Simplicity is powerful.

-Evan Williams, Cofounder, Blogger.com in Founders at Work: Stories of Startups' Early Days

Evan Williams didn't set out to build Blogger.com - the original product idea for the company he cofounded was an advanced web-based project management tool. Blogger was created on the side as a way for Evan and his cofounder to update their own weblog. The system was eventually made available to the public. Even as the use of this new blogging service took off, Evan remained reluctant to ditch the original project management tool idea.

Like all disruptive innovations, the collection of scripts that would become Blogger didn't represent any great technological leap. Rather, the software made it an order of magnitude more convenient to do something that people had already been doing for some time.

Whether they're aware of it or not, most people are wired to reject simple ideas by default, regardless of their merit. After all, difficult problems call for complex solutions. And simple ideas definitely aren't sexy. Fields that have experienced prolonged periods of stagnation are especially vulnerable to this mode of thinking.

Cheminformatics is poised to experience a similar phenomenon as old technologies are put to use in new ways to solve longstanding problems. Some of these solutions will seem absurdly simple - even trivial. Watch them closely.

Twelve Free Chemistry Databases

Posted by Rich Apodaca Tue, 07 Nov 2006 19:16:00 GMT

Just two years ago, trying to find free online chemistry databases was an exercise in futility. Now, they're sprouting up all over the Web like wildflowers after a wet Spring. What follows is a far-from-complete roundup of some of the more interesting places to start your chemical search.

  1. PubChem- The granddaddy of all free chemistry databases. Search over 8 million compounds by a variety of criteria. Although some PubChem records are linked into the primary literature through MeSH, most are not. But this doesn't seem to be PubChem's true calling. Instead, PubChem may well evolve into the world's largest online collection of molecular data sheets. Increasingly, the other databases in this list are cross-referencing their entries into PubChem. PubChem's entire database can be downloaded by FTP. CAS Registry are correct to see PubChem as the first real competition they've had in decades.

  2. ZINC- A free database of commercially-available compounds for virtual screening. Search over 4.6 million compounds by structure, IUPAC name, InChI, and a host of calculated properties.

  3. eMolecules- Google for molecules. With a simple interface and super fast search engine, eMolecules augments PubChem with other information sources, including specialty chemical catalogs. Although eMolecules' emphasis seems to be on commercially-available compounds, it's only possible to get a link directly into a supplier's online catalog for a limited number of molecules. Most of the links are to PubChem records. For this reason, I don't find eMolecules very useful in its current form. If you remember something called "Chmoogle", this is the same service (moral: don't mess with Google).

  4. CHEBI- "A freely available dictionary of molecular entities focused on ‘small’ chemical compounds." CHEBI draws its information from two main sources: Integrated Relational Enzyme Database of the EBI and the Kyoto Encyclopedia of Genes and Genomes. Find out what proteins a molecule has been associated with and in what context. Provides cross-links to CAS registry numbers, Beilstein registry numbers, and Gmelin registry numbers.

  5. NIST Chemistry WebBook- Physical data (thermochemical, thermophysical, and ion energetics) for mostly organic compounds. Search by formula, structure, CAS number, and IUPAC name.

  6. BioCyc- A collection of about 3,500 compounds involved as enzyme substrates, products, inhibitors, and activators. On accepting a license agreement, the entire database can be freely downloaded in Chemical Markup Language format.

  7. ChemExper- Find a supplier for your specialty chemical needs. Search by structure, name, molecular formula, and CAS number. After finding you compound, get an offer from one or more suppliers. I can't vouch for how this works in practice, but it sounds like a good idea.

  8. Compendium of Pesticide Common Names- More than 1,100 commonly-used pesticides. Compounds are located by browsing indexed lists (IUPAC name, CAS number, and trade name) rather than searching. Each entry lists, among other pieces of information, a chemical structure and sub-classifications (repellents, antifeedants, synergists, etc.).

  9. NMRShiftDB- Organic structures and their nuclear magnetic resonance (nmr) chemical shifts. NMRShiftDB contains chemical shift data for over 22,000 organic compounds and 19,000 spectra. Records can be searched by structure, chemical shift and nucleus. NMRShiftDB is truly open; it can be accessed programmatically and the source code for the software that runs the online database can be freely downloaded. Individual users can submit their own spectral shifts for peer review and subsequent inclusion into the database.

  10. Chemical Structure Lookup Service (CSLS)- An address book for chemical structures. If you've ever used Metacrawler, then you'll recognize the idea behind SCLS, which is to aggregate several free chemistry databases. Search over 27 million molecules by IUPAC name, InChI, structure, SMILES, and a variety of molecular identifiers. Your results set will contain links into specific databases that host the molecules you find. The user interface isn't just unfriendly - it's downright antisocial. But if you can get past this, CSLS may well be one of the most useful services in this list.

  11. DrugBank- Combines detailed drug data with comprehensive drug target information. Search over 4,300 drugs by trade name, SMILES, and InChI. Each record contains information on target of action, therapeutic indication, medications the drug is an ingredient of, and trade names. Searches can be limited to only approved drugs or experimental drugs. Both the concept and interface to this service are well thought-out.

  12. Wikipedia- Wikipedia? Yes, Wikipedia. Wikipedia offers several kinds of chemical information produced by a knowledgeable, all-volunteer army. Looking for information on organic compounds? Consider this datasheet on morphine as an example. For those interested in synthesis, Wikipedia is increasingly being used to collaboratively author short reviews on the topic. Search capabilities are currently limited to text and don't appear to work with IUPAC names or CAS numbers. Where this quintessential disruptive technology and its offspring end up taking chemical publishing is unclear, but the ride will be spectacular.

Disruptive Innovation in Scientific Publishing: Free Journal Management Systems

Posted by Rich Apodaca Thu, 19 Oct 2006 17:56:00 GMT

Like everything else in information technology, the costs of setting up and maintaining a scientific journal are rapidly approaching zero. A growing assortment of Open Source journal management systems is available today. Recently, I was introduced to one of these packages by Egon Willighagen as part of my involvement with CDK News.

Open Journal Systems

Open Journal Systems (OJS) automates the process of manuscript submission, peer review, editorial review, article release, and article indexing. All of these elements are, of course, cited as major costs by established publishers intent on maintaining their current business models.

OJS appears to work in much the same way as automated systems being run by major publishers. In fact, OJS is already in use by more than 800 journals written in ten languages worldwide.

Did I mention that OJS is free software - as in speech? The developers of OJS have licensed their work under the GPL, giving publishers the ability to control every aspect of how their journal management system operates. Standing out from the crowd will no doubt be an essential component of staying competitive in a world in which almost anyone can start their own journal.

Alternatives

And there's even better news: OJS has competition. Publishers can select from no fewer than seven open source journal management systems: DPubs; OpenACS; GAP; HyperJournal; SciX; Living Reviews ePubTk; and TOPAZ.

The Last Word

Open Source tools like Open Journal Systems have the potential to radically change the rules of the scientific publication game. By slashing the costs of both success and failure in scientific publication to almost zero, these systems are set to unleash an unprecedented wave of disruptive innovation - and not a moment too soon. What are the true costs of producing a quality Open Access scientific publication - and who pays? Will the idea of starting your own Open Access journal to address deficiencies with existing offerings catch on, especially in chemistry, chemical informatics, and computational chemistry? Before long, we will have answers to these questions.