Yet Another Free Chemistry Database: Pherobase 9

Posted by Rich Apodaca Tue, 15 Apr 2008 18:56:00 GMT

The creation of free chemical databases continues unabated. Today's entry is Pherobase, a service dedicated to documenting the relationship between chemical structures and the insect world.

Users can search Pherobase by text, or browse a large number of precompiled categories: alphabetical by genus; alphabetical by species; and compounds by genus or species. Each compound data sheet contains a wealth of data, all linked to the primary literature: mass spectrum; nmr; synthesis; and behavioral function. There's even an interactive Jmol model for each entry.

Pherobase is clearly designed to be useful to farmers and others involved in agriculture who are interested in using pheromones in pest control. Are insects eating your olive tree? Let pherobase help. Need help with fire ants? Pherobase can help there, too. Wonder what else besides Gypsy Moths might be affected by disparlure? Pherobase has the answer. And nearly all of this information is backed by references to the primary literature.

Pherobase clearly demonstrates the value of building comprehensive, focused chemical databases around a limited subject of high practical utility. After all, chemistry's most enduring contribution is in the production of useful properties, not the production of compounds.

Pherobase is also noteworthy for the way it's being used by its creator, Ashraf El-Sayed. Rather than standing on its own, Pherobase is designed to direct users to suppliers of pheromones and related pest control products by educating them about what might be possible. In this sense, Pherobase's approach offers another intriguing example of an Open Access business model that can actually work.

The Fundamental Cheminformatics Toolset

Posted by Rich Apodaca Tue, 08 Jan 2008 09:37:00 GMT

Reference: W.J. Howe and T.R. Hogadone, J. Chem. Inf. Model.

Imagine you need to create a cheminformatics system that's useful to chemists in their daily work. What tools would you absolutely need, regardless of the specific system you're building?

The answer to this question is hardly academic. If you're looking for ways to disproportionately improve the state of cheminformatics, improving the performance of one or more of its fundamental tools would seem to be a logical path.

Here, in no particular order, are my picks for the five fundamental cheminformatics tools:

  • 2D Structure Editor. Ubiquitous yet mostly-ignored, the 2D structure editor is the last mile connecting cheminformaticians with laboratory chemists. Take away the structure editor for data entry and building queries, and most cheminformatics systems become useless to the average chemist.

  • 2D Structure Renderer. Chemists expect their cheminformatics systems to communicate with them the way that other chemists do - through 2D chemical structures. Rendering software makes this possible. Like the 2D structure editor, structure renderers are a widely-ignored yet critical link between producers and consumers of cheminformatics software. Although the 2D renderer and editor need not necessarily be related, the two technologies are so similar that most 2D editors are based on a related 2D rendering engine.

  • Structure Query System. The purpose of the vast majority of cheminformatics systems is to produce a set of chemical structure results based on a structure query. The structure query system makes this possible. As the datasets that chemists deal with become ever larger, the ability to specify query structures at a high level of detail, and retrieve the results efficiently, becomes increasingly important. This is an area ripe for big improvements.

  • Low-Level Cheminformatics Toolkit. Most cheminformatics systems involve one or more elements specific to their problem domain. For example, predictive tools may use molecular descriptors. A robust and versatile low-level cheminformatics toolkit makes it possible to build problem-specific cheminformatics libraries. This toolkit may or may not be used in the 2D structure editor and renderer, depending on whether an adequate text-based molecular language is available (see below).

  • Text-Based Molecular Language. Cheminformatics systems are frequently built from components developed independently by multiple groups. These systems may be developed in different programming languages, may even run on different operating systems, and may need to communicate over a network connection. A well-specified, open, text-based molecular language makes it possible for these systems to interoperate. Two widely-used examples include MDL's molfile format and Daylight's SMILES, both of which have significant limitations.

One of the reasons I consider this set of cheminformatics tools in particular to be fundamental is the perennial need to use and improve them. Elements of each of these tools can be seen, for example, in the COUSIN system developed by Howe and Hogadone at Upjohn over 25 years ago. Comparison of this system with PubChem shows just how little the basic problems change, despite major changes in underlying technology.

What are your fundamental cheminformatics tools and which of them are you working to improve?

Yet Another Free Chemistry Database: Heterocycles Web Edition

Posted by Rich Apodaca Fri, 06 Jul 2007 09:57:00 GMT

Yet another free chemistry database comes in the form of a service run by the journal Heterocycles. The Heterocycles Web Edition offers two ways to search for heterocylic ring systems: by structure or by synthesis.

You may assume that these services would only search the contents of Heterocycles. It would then be a pleasant surprise to find a number of highly-regarded journals being covered. Here are some of titles:

  • Angew. Chem. Int. Ed. Engl.
  • Chem. Eur. J.
  • Eur. J. Org. Chem.
  • Heterocycles
  • J. Am. Chem. Soc.
  • J. Med. Chem.
  • J. Nat. Prod.
  • J. Org. Chem.
  • Org. Lett.
  • Synlett
  • Tetrahedron
  • Tetrahedron Lett.

The current query interface supports text only, although a number of important criteria can be used. I haven't searched for many heterocyles, but my results for indolizidine give a flavor for what you might expect (the actual number of hits was 115):

It would be interesting to know how Heterocycles populated its database. Is it text-mining, manual curation, both, or something else? Regardless of how it's done, Heterocycles Web Edition is definitely worth looking at.

Yet Another Free Chemical Database: Reaction Searching with CMLD-BU 1

Posted by Rich Apodaca Mon, 18 Jun 2007 09:08:00 GMT

As chemical informatics continues its climb out of a decades-long stagnation, the number of free chemical databases continues to grow. But despite all the activity, reaction databases are notably under-represented. For this reason, I was delighted to stumble onto Boston University's Center for Chemical Methodology and Library Development Reaction Database (CMLD-BU).

According to their website, CMLD-BU:

...is a new center funded by the National Institute of General Medical Sciences ( NIGMS ) focused on the discovery of new methodologies to produce novel chemical libraries of unprecedented complexity for biological screening. The goal of the CMLD-BU is to explore and expand the diversity of small-molecule libraries by creating general, useful protocols for stereocontrolled synthesis. ... A major objective of the CMLD-BU is also to provide information and chemistry protocols to the public on parallel and chemical library synthesis. ...

Use this link to begin exploring their service. To date the CMLD-BU has deposited just over 1,600 Substances with PubChem and their site shows 125 reaction protocols.

Although CMLD-BU's user interface could use some tweaking, their content is right on the money: real examples of preparative reactions with links to the primary literature and even spectral data.

Are we at the end of this process or at the beginning? Only time will tell. But the nearly infinite shelflife and ubiquity of chemical information coupled with the inexorable approach of virtually zero-cost computer services leaves only one of those two possibilities worthy of serious consideration.

Free Chemistry Databases on the Web: Creating a Comprehensive Guide 20

Posted by Rich Apodaca Mon, 07 May 2007 09:32:00 GMT

One of Depth-First's more popular articles is a summary of free databases titled Thirty-Two Free Chemistry Databases. Clearly there is a need to link the producers of free chemical databases (developers) with the potential users of these services (chemists). Chemistry is slowly emerging from a decades-long period of over-reliance on a single supplier of information. As new players enter, they'll need some way to have their message heard.

The Problem

As evidence of this need, I'm getting more requests to list additional services on the Thirty-Two Databases article - or to provide an updated review of a service already there. This is wonderful!

One approach would be for me to simply research and write an updated article reviewing the new additions myself. The problem is that thirty-two is already a very large number to deal with. My guess is that there must now be well over sixty or seventy free chemistry databases. That's far too many for one person to research properly on their own.

On the other hand, the Web is all about collaboration, so why no try to use it that way?

An Idea

Here's the idea: if you run a free database or other online chemistry service and would like to promote it, post a comment to this article containing a link and brief description of what makes your service different/useful. If you've used a free chemistry database, feel free to provide your thoughts on it. If there's a free database you wish existed but doesn't yet, feel free to write about that. Unlike the other articles on this site for which comments are closed after two weeks, this article's comments will remain open indefinitely.

After some period of time, I'll use these comments to write a new article highlighting the new material.

Notice the use of the word "free". A free database can be used by any member of the general public without fees or a lengthy registration process. This includes both free speech and free beer services. There are more restrictive definitions that could be applied, but let's not worry about those just yet. Free beer is better than no beer at all.

Links can either be in HTML or Markdown. Here's one example of each:

<a href="http://megamolecules.com">MegaMolecules</a> (HTML)

[MegaMolecules](http://megamolecules.com) (Markdown)

The Outcome

I have no idea what kind of response this experiment will generate. But if past experience is any guide, large numbers of chemists are keenly interested in free chemistry databases. All they need is a link.

Image Credit: Kate and Dave Hugh

Older posts: 1 2