Yet Another Free Chemistry Database: Pherobase 9

Posted by Rich Apodaca Tue, 15 Apr 2008 22:56:00 GMT

The creation of free chemical databases continues unabated. Today's entry is Pherobase, a service dedicated to documenting the relationship between chemical structures and the insect world.

Users can search Pherobase by text, or browse a large number of precompiled categories: alphabetical by genus; alphabetical by species; and compounds by genus or species. Each compound data sheet contains a wealth of data, all linked to the primary literature: mass spectrum; nmr; synthesis; and behavioral function. There's even an interactive Jmol model for each entry.

Pherobase is clearly designed to be useful to farmers and others involved in agriculture who are interested in using pheromones in pest control. Are insects eating your olive tree? Let pherobase help. Need help with fire ants? Pherobase can help there, too. Wonder what else besides Gypsy Moths might be affected by disparlure? Pherobase has the answer. And nearly all of this information is backed by references to the primary literature.

Pherobase clearly demonstrates the value of building comprehensive, focused chemical databases around a limited subject of high practical utility. After all, chemistry's most enduring contribution is in the production of useful properties, not the production of compounds.

Pherobase is also noteworthy for the way it's being used by its creator, Ashraf El-Sayed. Rather than standing on its own, Pherobase is designed to direct users to suppliers of pheromones and related pest control products by educating them about what might be possible. In this sense, Pherobase's approach offers another intriguing example of an Open Access business model that can actually work.

Comments

Leave a response

  1. Wolfgang Robien Thu, 17 Apr 2008 00:25:35 GMT

    There is a database available holding approx. 3 billions of structure-spectra correlations. The purpose of this database is to get structure proposals to a query-peaklist (from C-NMR). Further applications of the technology behind this server have been summarized on my weblog. Access to this database is free.

  2. Rich Apodaca Thu, 17 Apr 2008 22:55:53 GMT

    Wolfgang, that's a very cool way to use PubChem. I'd still like to be able to leave off the carbon-proton coupling multiplicities, though - any chance that that could happen?

  3. Wolfgang Robien Fri, 18 Apr 2008 00:12:14 GMT

    Rich, in this implementation as it is now, the multiplicity from APT-spectra (=odd,even) is used. I had already the opposite request to use full multiplicity information e.g. from a DEPT-series. Both requested alternate approaches might be useful - omitting multiplicity would decrease the necessary disk-space by approx. 25GB, the time for searching is slightly higher, but the selectivity of the result would be dramatically(!) lower (in other words, the resulting hitlist would be MUCH larger). Using full multiplicity information (and using the same coding-scheme) would increase disk requirements by approx. 80GB, increase speed slightly and making the result more selective. All 3 variants together would need approx. 800GB of disk space. I am willing to setup all 3 servers in parallel; it simply depends on the usage. In case of frequent use (and a positive feedback) its worth the work !

  4. Rich Apodaca Mon, 21 Apr 2008 15:38:25 GMT

    ...omitting multiplicity would decrease the necessary disk-space by approx. 25GB, the time for searching is slightly higher, but the selectivity of the result would be dramatically(!) lower (in other words, the resulting hitlist would be MUCH larger).

    Wolfgang, how much larger would the average hitlist become? How big of a problem would this really pose?

  5. wolfgang.robien@univie.ac.at Thu, 24 Apr 2008 12:44:14 GMT

    I would expect that the hitlist becomes MUCH larger (by a factor 2 to 5) when omitting multiplicity information - the factor severly depends on the spectrum ! Behaviour is similar to the Tanimoto index, where you get bad results for featureless molecules and good results for molecules with a large number of functional groups. The same is true for CNMR-spectra, when the number of lines becomes larger and the peaks are scattered over the total shiftrange ....

  6. Rich Apodaca Thu, 24 Apr 2008 16:06:36 GMT

    Wolfgang, thanks for taking the time to quantify this. So if I'm starting with a hitlist of 3 molecules, I might expect a hitlist of up to 15 if I leave out multiplicity information.

    Still, for some applications, that might be good enough.

    Here's an application for this system that I've been kicking around:

    A Web-based electronic lab notebook (ELN) system allows users to draw their reaction schemes. One of the things it does is to allow users to check to see if any of their analytical data are inconsistent with their proposed product structure.

    This happens without the user actually needing to do anything - much like syntax highlighting and error checking happen automatically in most modern IDE's.

    Because this system already has a structure in mind, all it needs to do is check, with your system, whether the proposed structure is consistent with the 13C spectrum. If not, the entry could be flagged, a popup could be generated, or something similar could happen to let the scientist know that a problem has been found.

    Anyway, maybe this is science fiction, but it does seem like the necessary technologies are already available.

    There are other possibilities as well, of course...

  7. Wolfgang Robien Sun, 27 Apr 2008 14:11:18 GMT

    For this application I recommend to run a NMRPredict-Server in the background - this gives predicted spectra and allows ranking of hitlists.

    The application I was talking about, is designed for structure elucidation or 'compound class elucidation', but not for ranking of structural proposals.

  8. Ryan Sasaki Thu, 01 May 2008 16:42:09 GMT

    Hi Rich,

    That's a very interesting application you suggest within an ELN environment.

    In fact, Phil Keyes is doing something like that in his lab except instead of doing it within the ELN, they do it within the registration system.

    Check out my post on this and a link to his presentation here:

    http://acdlabs.typepad.com/my_weblog/2007/10/applications-of.html

  9. Rich Apodaca Sun, 04 May 2008 14:14:08 GMT

    Ryan, sounds similar to what I was describing. I can see how a "smart" compound registration system that uses analytical data as a check on structure would appeal to many organizations. I found the presentation from Lexicon especially interesting.

Comments