Yet Another Free Chemistry Database: Pherobase 9

Posted by Rich Apodaca Tue, 15 Apr 2008 18:56:00 GMT

The creation of free chemical databases continues unabated. Today's entry is Pherobase, a service dedicated to documenting the relationship between chemical structures and the insect world.

Users can search Pherobase by text, or browse a large number of precompiled categories: alphabetical by genus; alphabetical by species; and compounds by genus or species. Each compound data sheet contains a wealth of data, all linked to the primary literature: mass spectrum; nmr; synthesis; and behavioral function. There's even an interactive Jmol model for each entry.

Pherobase is clearly designed to be useful to farmers and others involved in agriculture who are interested in using pheromones in pest control. Are insects eating your olive tree? Let pherobase help. Need help with fire ants? Pherobase can help there, too. Wonder what else besides Gypsy Moths might be affected by disparlure? Pherobase has the answer. And nearly all of this information is backed by references to the primary literature.

Pherobase clearly demonstrates the value of building comprehensive, focused chemical databases around a limited subject of high practical utility. After all, chemistry's most enduring contribution is in the production of useful properties, not the production of compounds.

Pherobase is also noteworthy for the way it's being used by its creator, Ashraf El-Sayed. Rather than standing on its own, Pherobase is designed to direct users to suppliers of pheromones and related pest control products by educating them about what might be possible. In this sense, Pherobase's approach offers another intriguing example of an Open Access business model that can actually work.

Yet Another Free Chemistry Database: Sigma-Aldrich Reaction Search

Posted by Rich Apodaca Tue, 16 Oct 2007 09:45:00 GMT

Yet another free chemistry database comes in the form of Sigma-Aldrich's Reaction Search. Draw your reactant and product, and let the database do the rest. Both exact and substructure matching can be used. Search results contain bibliographical references to the primary literature. And if a result's product is available from Aldrich, you'll get a link to the product summary page from where you can purchase it.

Like Sigma-Aldrich's ChemBlogs, Reaction Search matches the needs of a for-profit company to attract customers with the needs of chemists for unrestricted free access to research tools. Could this be the shape of things to come?

The parallels to Aldrich's Handbook are striking. Few bench chemists today regularly use a CRC Handbook, yet nearly all of them have a copy of the Aldrich catalog at their desks. Aldrich seems to get this simple idea like no other company. And they're quietly transferring this understanding to the Web.

Yet Another Free Chemistry Database: Heterocycles Web Edition

Posted by Rich Apodaca Fri, 06 Jul 2007 09:57:00 GMT

Yet another free chemistry database comes in the form of a service run by the journal Heterocycles. The Heterocycles Web Edition offers two ways to search for heterocylic ring systems: by structure or by synthesis.

You may assume that these services would only search the contents of Heterocycles. It would then be a pleasant surprise to find a number of highly-regarded journals being covered. Here are some of titles:

  • Angew. Chem. Int. Ed. Engl.
  • Chem. Eur. J.
  • Eur. J. Org. Chem.
  • Heterocycles
  • J. Am. Chem. Soc.
  • J. Med. Chem.
  • J. Nat. Prod.
  • J. Org. Chem.
  • Org. Lett.
  • Synlett
  • Tetrahedron
  • Tetrahedron Lett.

The current query interface supports text only, although a number of important criteria can be used. I haven't searched for many heterocyles, but my results for indolizidine give a flavor for what you might expect (the actual number of hits was 115):

It would be interesting to know how Heterocycles populated its database. Is it text-mining, manual curation, both, or something else? Regardless of how it's done, Heterocycles Web Edition is definitely worth looking at.

Yet Another Free Chemical Database: Reaction Searching with CMLD-BU 1

Posted by Rich Apodaca Mon, 18 Jun 2007 09:08:00 GMT

As chemical informatics continues its climb out of a decades-long stagnation, the number of free chemical databases continues to grow. But despite all the activity, reaction databases are notably under-represented. For this reason, I was delighted to stumble onto Boston University's Center for Chemical Methodology and Library Development Reaction Database (CMLD-BU).

According to their website, CMLD-BU:

...is a new center funded by the National Institute of General Medical Sciences ( NIGMS ) focused on the discovery of new methodologies to produce novel chemical libraries of unprecedented complexity for biological screening. The goal of the CMLD-BU is to explore and expand the diversity of small-molecule libraries by creating general, useful protocols for stereocontrolled synthesis. ... A major objective of the CMLD-BU is also to provide information and chemistry protocols to the public on parallel and chemical library synthesis. ...

Use this link to begin exploring their service. To date the CMLD-BU has deposited just over 1,600 Substances with PubChem and their site shows 125 reaction protocols.

Although CMLD-BU's user interface could use some tweaking, their content is right on the money: real examples of preparative reactions with links to the primary literature and even spectral data.

Are we at the end of this process or at the beginning? Only time will tell. But the nearly infinite shelflife and ubiquity of chemical information coupled with the inexorable approach of virtually zero-cost computer services leaves only one of those two possibilities worthy of serious consideration.

Hacking PubChem: Learning to Speak PUG 1

Posted by Rich Apodaca Mon, 11 Jun 2007 09:04:00 GMT

A previous article introduced PubChem's Power User Gateway (PUG), an XML-based communication channel. Although NIH kindly supplies a commented schema for PUG queries and responses, there's nothing like seeing real examples when learning a new language. This article will describe one method for conveniently generating PUG XML queries.

Let PubChem Build Your Query

One of the options on the PubChem search page is "Save Query." As it turns out, PubChem saves queries in PUG XML (I'll just call it PUGML). In other words, preparing a query using the PubChem search page and saving it gives a simple method for creating PUGML queries. Let's try it.

Using the "Sketch" button, draw the structure of benzimidazole. Under "Search Type", select "Substructure." Now click "Save Query", and you'll download a substructure query for benzimidazole in PUGML:

<?xml version="1.0"?>
<!DOCTYPE PCT-Data PUBLIC "-//NCBI//NCBI PCTools/EN" "http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd">
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_query>
        <PCT-Query>
          <PCT-Query_type>
            <PCT-QueryType>
              <PCT-QueryType_css>
                <PCT-QueryCompoundCS>
                  <PCT-QueryCompoundCS_query>
                    <PCT-QueryCompoundCS_query_data>C1=CC=CC2=C1N=C[N]2</PCT-QueryCompoundCS_query_data>
                  </PCT-QueryCompoundCS_query>
                  <PCT-QueryCompoundCS_type>
                    <PCT-QueryCompoundCS_type_subss>
                      <PCT-CSStructure>
                        <PCT-CSStructure_bonds value="true"/>
                      </PCT-CSStructure>
                    </PCT-QueryCompoundCS_type_subss>
                  </PCT-QueryCompoundCS_type>
                  <PCT-QueryCompoundCS_results>2000000</PCT-QueryCompoundCS_results>
                </PCT-QueryCompoundCS>
              </PCT-QueryType_css>
            </PCT-QueryType>
          </PCT-Query_type>
        </PCT-Query>
      </PCT-InputData_query>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>

The PCT-QueryCompoundCS_type_subss element will tell PUG to look for substructures.

Using the Saved Query with PUG

Saving this file as benzimidazole_sss.xml, lets us feed it to PUG:

$ curl -d @benzimidazole_sss.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi"

and get the following PUGML response:

<?xml version="1.0"?>
<!DOCTYPE PCT-Data PUBLIC "-//NCBI//NCBI PCTools/EN" "http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd">
<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="queued"/>
          </PCT-Status-Message_status>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_waiting>
          <PCT-Waiting>
            <PCT-Waiting_reqid>62668946396085905</PCT-Waiting_reqid>
            <PCT-Waiting_message>Structure search job was submitted</PCT-Waiting_message>
          </PCT-Waiting>
        </PCT-OutputData_output_waiting>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>
We can then check on the status of our query by saving the following as status.xml:
<PCT-Data>
  <PCT-Data_input>
    <PCT-InputData>
      <PCT-InputData_request>
        <PCT-Request>
          <PCT-Request_reqid>62668946396085905</PCT-Request_reqid>
          <PCT-Request_type value="status"/>
        </PCT-Request>
      </PCT-InputData_request>
    </PCT-InputData>
  </PCT-Data_input>
</PCT-Data>
POSTing this to PUG:
$ curl -d @status.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi"

gives us the following PUGML:

<?xml version="1.0"?>
<!DOCTYPE PCT-Data PUBLIC "-//NCBI//NCBI PCTools/EN" "http://pubchem.ncbi.nlm.nih.gov/pug/pug.dtd">
<PCT-Data>
  <PCT-Data_output>
    <PCT-OutputData>
      <PCT-OutputData_status>
        <PCT-Status-Message>
          <PCT-Status-Message_status>
            <PCT-Status value="success"/>
          </PCT-Status-Message_status>
          <PCT-Status-Message_message>Your search has already been completed successfully!.</PCT-Status-Message_message>
        </PCT-Status-Message>
      </PCT-OutputData_status>
      <PCT-OutputData_output>
        <PCT-OutputData_output_entrez>
          <PCT-Entrez>
            <PCT-Entrez_db>pccompound</PCT-Entrez_db>
            <PCT-Entrez_query-key>1</PCT-Entrez_query-key>
            <PCT-Entrez_webenv>0CPrI_peUmUtWDooyjxpJ1XAXPcOl-ESZZxj8sJV9ZDR8musMjh1oBTib@1EDD43FA66AE1BE0_0001SID</PCT-Entrez_webenv>
          </PCT-Entrez>
        </PCT-OutputData_output_entrez>
      </PCT-OutputData_output>
    </PCT-OutputData>
  </PCT-Data_output>
</PCT-Data>

Last time, we got a URL to download a gzipped SD File. This time, our query specified results to be returned as an Entrez Key through the PCT-Entrez_webenv element. We can construct a URL that will let us view these results:

http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=HistorySearch&WebEnvRq=1&db=pccompound&query_key=1&WebEnv=0CPrI_peUmUtWDooyjxpJ1XAXPcOl-ESZZxj8sJV9ZDR8musMjh1oBTib%401EDD43FA66AE1BE0_0001SID

Where to Next?

If we wanted to get a gzipped SD File instead, we'd need to edit our original query. But manually editing XML is a lot like mowing a lawn with scissors. What we'd really like is a simple API in a language like Ruby that will let us build sophisticated PUG queries, process the results, and pipe them into other queries with little effort. But that's a story for another time.

Image Credit: sutterbabe68

Older posts: 1 2 3 4