Twelve Free Chemistry Databases
October 12, 2011: An updated version of this post is available at Sixty-Four Free Chemistry Databases.
Just two years ago, trying to find free online chemistry databases was an exercise in futility. Now, they're sprouting up all over the Web like wildflowers after a wet Spring. What follows is a far-from-complete roundup of some of the more interesting places to start your chemical search.
- PubChem- The granddaddy of all free chemistry databases. Search over 8 million compounds by a variety of criteria. Although some PubChem records are linked into the primary literature through MeSH, most are not. But this doesn't seem to be PubChem's true calling. Instead, PubChem may well evolve into the world's largest online collection of molecular data sheets. Increasingly, the other databases in this list are cross-referencing their entries into PubChem. PubChem's entire database can be downloaded by FTP. CAS Registry are correct to see PubChem as the first real competition they've had in decades.
- ZINC- A free database of commercially-available compounds for virtual screening. Search over 4.6 million compounds by structure, IUPAC name, InChI, and a host of calculated properties.
- eMolecules- Google for molecules. With a simple interface and super fast search engine, eMolecules augments PubChem with other information sources, including specialty chemical catalogs. Although eMolecules' emphasis seems to be on commercially-available compounds, it's only possible to get a link directly into a supplier's online catalog for a limited number of molecules. Most of the links are to PubChem records. For this reason, I don't find eMolecules very useful in its current form. If you remember something called "Chmoogle", this is the same service (moral: don't mess with Google).
- CHEBI- "A freely available dictionary of molecular entities focused on ‘small’ chemical compounds." CHEBI draws its information from two main sources: Integrated Relational Enzyme Database of the EBI and the Kyoto Encyclopedia of Genes and Genomes. Find out what proteins a molecule has been associated with and in what context. Provides cross-links to CAS registry numbers, Beilstein registry numbers, and Gmelin registry numbers.
- NIST Chemistry WebBook- Physical data (thermochemical, thermophysical, and ion energetics) for mostly organic compounds. Search by formula, structure, CAS Number, and IUPAC name.
- BioCyc- A collection of about 3,500 compounds involved as enzyme substrates, products, inhibitors, and activators. On accepting a license agreement, the entire database can be freely downloaded in Chemical Markup Language format.
- ChemExper- Find a supplier for your specialty chemical needs. Search by structure, name, molecular formula, and CAS Number. After finding you compound, get an offer from one or more suppliers. I can't vouch for how this works in practice, but it sounds like a good idea.
- Compendium of Pesticide Common Names- More than 1,100 commonly-used pesticides. Compounds are located by browsing indexed lists (IUPAC name, CAS Number, and trade name) rather than searching. Each entry lists, among other pieces of information, a chemical structure and sub-classifications (repellents, antifeedants, synergists, etc.).
- NMRShiftDB- Organic structures and their nuclear magnetic resonance (nmr) chemical shifts. NMRShiftDB contains chemical shift data for over 22,000 organic compounds and 19,000 spectra. Records can be searched by structure, chemical shift and nucleus. NMRShiftDB is truly open; it can be accessed programmatically and the source code for the software that runs the online database can be freely downloaded. Individual users can submit their own spectral shifts for peer review and subsequent inclusion into the database.
- Chemical Structure Lookup Service (CSLS)- An address book for chemical structures. If you've ever used Metacrawler, then you'll recognize the idea behind SCLS, which is to aggregate several free chemistry databases. Search over 27 million molecules by IUPAC name, InChI, structure, SMILES, and a variety of molecular identifiers. Your results set will contain links into specific databases that host the molecules you find. The user interface isn't just unfriendly - it's downright antisocial. But if you can get past this, CSLS may well be one of the most useful services in this list.
- DrugBank- Combines detailed drug data with comprehensive drug target information. Search over 4,300 drugs by trade name, SMILES, and InChI. Each record contains information on target of action, therapeutic indication, medications the drug is an ingredient of, and trade names. Searches can be limited to only approved drugs or experimental drugs. Both the concept and interface to this service are well thought-out.
- Wikipedia- Wikipedia? Yes, Wikipedia. Wikipedia offers several kinds of chemical information produced by a knowledgeable, all-volunteer army. Looking for information on organic compounds? Consider this datasheet on morphine as an example. For those interested in synthesis, Wikipedia is increasingly being used to collaboratively author short reviews on the topic. Search capabilities are currently limited to text and don't appear to work with IUPAC names or CAS Numbers. Where this quintessential disruptive technology and its offspring end up taking chemical publishing is unclear, but the ride will be spectacular.