Simple CAS Number Lookup with PubChem 2

Posted by Rich Apodaca Mon, 21 May 2007 15:46:00 GMT

CAS Registry Numbers simplify the thorny problem of referring to chemical substances. These short numerical sequences are arguably the most widely-used form of molecular identifier, appearing on reagent bottles, in publications, in patents and patent applications, and MSDS sheets.

During my time as a synthetic organic chemist, I would sometimes run into the problem of finding the structure of a molecule represented by a CAS number. A common case was when an ambiguous, incomprehensible, or blurred IUPAC name was printed on a reagent bottle along with a CAS number. By looking up the CAS number, I could confirm the bottle's contents.

Your first impulse when looking up a CAS number might be to fire up SciFinder. For years this was the only option. Those days are quickly starting to seem as quaint as when people actually wrote on pieces of paper and dropped them in mailboxes (dropping DVDs in a mailbox is a different matter).

A little-publicized feature of PubChem makes it an ideal way to quickly find the structure associated with a CAS Number. To use it, you need nothing more than a computer, a browser, and an internet connection.

Browse over to the PubChem welcome page. At the top you'll find a search box. Enter your CAS number and press "Go." For this example, I'm using the CAS number for 2,5-Pyrazinedicarboxylic acid dihydrate:

If all goes well, you should see a results screen containing the structure of your compound and a link to its summary page:

Does this seem a little too good to be true? Try it for yourself. Pick up a copy of the Aldrich catalog, Merck index, or anything else that lists lots of CAS numbers. Choose several structures at random and see how PubChem performs.

There are limitations to this method. PubChem generally doesn't index large molecules such as polymers and peptides, so they won't be found by this method. Similarly, if a CAS number doesn't point to a distinct molecular entity (e.g. "mineral oil"), PubChem won't find it either. But these are hardly limitations in the vast majority of cases.

With the recent addition of Sigma-Aldrich as a PubChem compound supplier, it won't be long before smaller companies begin following suit. What we're seeing with PubChem is a classic example of a network effect. The end result should come as a surprise to nobody.

Comments

Leave a response

  1. Antony Williams Tue, 22 May 2007 15:45:12 GMT

    Rich, the PubChem database is a mine of information for sure. I like the capability to search based on CAS-RN but wonder about the long term availability. Specifically I note the information at the CAS website regarding the legal policy: http://www.cas.org/legal/infopolicy.html

    "A User or Organization may include, without a license and without paying a fee, up to 10,000 CAS Registry Numbers or CASRNs in a catalog, website, or other product for which there is no charge. The following attribution should be referenced or appear with the use of each CASRN: CAS Registry NumberĀ® is a Registered Trademark of the American Chemical Society. CAS recommends the verification of the CASRNs through CAS Client ServicesSM."

    There are likely more than 10,000 CAS RNs on PubChem...

  2. Chris Southan Mon, 28 May 2007 07:17:33 GMT

    1) anyone know the exact count of CAS RNs in PubChem ? 2) anyone have a tip-sheet on converting SciFinder entries to SMILES ?