Sixty-Four Free Chemistry Databases

October 12, 2011

The open Web offers a rich collection of diverse chemical data sources - if you know where to look. It's been over four years since I wrote the previous post in this series describing some emerging chemical databases, and a lot has happened in this space. The time seems right for an update.

Although some of the original databases are no longer active, it's encouraging to see that a number of them continue to run and even prosper. It's of course likely that still more services will be created and retired in the coming years. If you know of a free chemistry database that's missing from this list, please leave a comment or contact me directly.

Below I give you, in no particular order, a collection of free chemistry databases.

  1. PubChem "... organized as three linked databases ... PubChem Substance, PubChem Compound, and PubChem BioAssay."
  2. ZINC "... a free database of commercially-available compounds for virtual screening. ZINC contains over 13 million purchasable compounds in ready-to-dock, 3D formats."
  3. eMolecules "eMolecules discovers sources of chemical data by searching the Internet, and receives submissions from data providers such as chemical suppliers and academic research institutions."
  4. ChEBI "Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds."
  5. NIST Chemistry WebBook "...provides thermochemical, thermophysical, and ion energetics data compiled by NIST under the Standard Reference Data Program."
  6. ChemExper "This database contains chemicals with their physical characteristics. Everybody can submit chemical information and retrieve information with a Web browser."
  7. Compendium of Common Pesticide Names "This Compendium is believed to be the only place where all of the ISO-approved standard names of chemical pesticides are listed. It also includes more than 300 approved names from national and international bodies for pesticides that do not have ISO names."
  8. NMRShiftDB "... a NMR database (web database) for organic structures and their nuclear magnetic resonance (nmr) spectra. It allows for spectrum prediction (13C, 1H and other nuclei) as well as for searching spectra, structures and other properties. Last not least, it features peer-reviewed submission of datasets by its users."
  9. DrugBank "... a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains 6707 drug entries including 1436 FDA-approved small molecule drugs, 134 FDA-approved biotech (protein/peptide) drugs, 83 nutraceuticals and 5086 experimental drugs."
  10. Wikipedia
  11. ChemBank "... includes freely available data derived from small molecules and small-molecule screens, and resources for studying the data so that biological and medical insights can be gained. ChemBank is intended to guide chemists synthesizing novel compounds or libraries, to assist biologists searching for small molecules that perturb specific biological pathways, and to catalyze the process by which drug hunters discover new and effective medicines."
  12. National Institute of Allergy and Infectious Diseases Database "There are three main search portals for the database: the chemical portal allows searches by compound structure or chemical characteristics, the biological portal performs searches based on compound activity against a specific pathogen or enzyme, and the literature portal supports searches based on publication information, such as author name or journal."
  13. NIST Chemical Kinetics Database "A compilation of kinetics data on gas-phase reactions"
  14. Computational Chemistry Comparison and Benchmark DataBase "Experimental and computational thermochemical data for a selected set of 1420 gas-phase atoms and molecules. Tools for comparing experimental and computational ideal-gas thermochemical properties."
  15. IUPAC-NIST Solubility Database "Data compiled and evaluated by IUPAC (International Union of Pure and Applied Chemistry)"
  16. KEGG "KEGG is an integrated database resource consisting of the following databases, broadly categorized into systems information, genomic information, and chemical information ..."
  17. BRENDA "BRENDA is the main collection of enzyme functional data available to the scientific community. ... The data collection is being developed into a metabolic network information system with links to Enzyme expression and regulation information."
  18. ChemMine "ChemMine is a compound mining database that facilitates drug and agrochemical discovery and chemical genomics screens."
  19. Organic Syntheses "... Each procedure is written in considerably more detail as compared to typical experimental procedures in other journals, and each reaction and all characterization data has been carefully "checked" for reproducibility in the laboratory of a member of the Board of Editors."
  20. WebReactions "When a synthetic chemist thinks of a reaction, he envisions first the making and breaking of bonds at the reaction center as the defining nature of the reaction. Subsequently he considers the effects of surrounding groups, i.e., on rate, hindrance, or resistance to change under the reaction conditions. The WebReactions program mirrors this approach for indexing reaction entries in any database."
  21. Spectral Database for Organic Compounds "SDBS is an integrated spectral database system for organic compounds,which includes 6 different types of spectra under a directory of the compounds"
  22. BindingDB "BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of protein considered to be drug-targets with small, drug-like molecules. BindingDB contains 781,982 binding data, for 6,448 protein targets and 342,414 small molecules."
  23. PDBBind "The PDBbind database is designed to provide a collection of experimentally measured binding affinity data (Kd, Ki, and IC50) exclusively for the protein-ligand complexes available in the Protein Data Bank (PDB). All of the binding affinity data compiled in thisdatabase are cited from original references."
  24. AffinDB "... affinity data for protein-ligand complexes of the PDB. Its purpose is to provide direct and free access to the experimental affinity of a given complex structure. As of Thursday, October 13th, 2011, AffinDB contains 748 affinity values covering 474 different PDB complexes."
  25. Heterocycles Web Edition "This journal will list the new natural products with a heterocyclic ring system, collected from current chemical literature, whose structure has been established."
  26. Electronic Encyclopedia of Reagents for Organic Synthesis "This online edition is more than a Major Reference Work, as it combines the complete text of the Encyclopedia with the sophistication of a Database including all the chemical reactions and structures!"
  27. CrystalEye "The aim of the CrystalEye project is to aggregate crystallography from web resources, and to provide methods to easily browse, search, and to keep up to date with the latest published information."
  28. Common Chemistry "...a web resource that contains CAS Registry Numbers for approximately 7,900 chemicals of widespread general public interest. Common Chemistry is helpful to non-chemists who know either a name or CAS Registry Number® of a common chemical and want to pair both pieces of information."
  29. mylims.org "mylims.org allows anybody to store and process NMR spectra on-line. By default all your spectra are only accessible to yourself but you may decide to share your spectra with a group of people or to anybody."
  30. PheroBase "Currently, there are over 30000 entries, around 8000 molecules, and over 100000 static php pages that make it the world's largest database of behaviour modifying chemicals. In addition, mass spectral, NMR, synthesis data for more than 2500 compounds are included."
  31. Side Effect Resource (SIDER) "SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. The available information include side effect frequency, drug and side effect classifications as well as links to further information, for example drug–target relations."
  32. ChemSynthesis "ChemSynthesis is a freely accessible database of chemicals. This website contains substances with their synthesis references and physical properties such as melting point, boiling point and density. There are currently more than 40,000 compounds and more than 45,000 synthesis references in the database."
  33. Symmetry@Otterbein "The resources contained within this web site are designed to help students learn concepts of molecular symmetry and to help faculty teachconcepts of molecular symmetry."
  34. ChemSpider "ChemSpider is a free chemical structure database providing fast text and structure search access to over 26 million structures from hundreds of data sources."
  35. Human Metabolome Database "The Human Metabolome Database (HMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body. ... The database (version 2.5) contains over 7900 metabolite entries including both water-soluble and lipid soluble metabolites as well as metabolites that would be regarded as either abundant (> 1 uM) or relatively rare (< 1 nM)."
  36. DockBlaster "... a public access service for structure-based ligand discovery. DOCK Blaster aims to answer the question: What small molecules should I purchase and test for activity against my biological target for which I have a structure?"
  37. ChemWiki "... a collaborative approach toward chemistry education where an Open Access textbook environment is constantly being written and re-written partly by students and partly by faculty members resulting in a free Chemistry textbook to supplement conventional paper-based books."
  38. STITCH 2 "STITCH is a resource to explore known and predicted interactions of chemicals and proteins. Chemicals are linked to other chemicals and proteins by evidence derived from experiments, databases and the literature."
  39. Spectra Online "The Spectra Online database is a collection of public domain and other data generously contributed from various sources."
  40. Chemical Identifier Resolver "This service works as a resolver for different chemical structure identifiers and allows one to convert a given structure identifier into another representation or structure identifier. It can help you identify and find the chemical structure if you have an identifier such as an InChIKey."
  41. FTIRSearch.com "... dedicated to providing analytical chemists with on-line access to high quality FTIR and Raman spectral libraries."
  42. NRG-CING
  43. CoCoCo "... a freely available tool for those who wish to set up their own in silico project of drug design, in particular pharmacophore screenings. While CoCoCo is especially conceived for medicinal chemists, this set of tools may be of valuable help also for not-expert that wish to initiate their own project in the field of computational drug design."
  44. LookChem "There are 6,000,000 chemicals with CAS number which is an identity for one chemical. And about 10 million CAS numbers are on the top of first page on Google.com."
  45. LipidMaps "... a free, comprehensive website for researchers interested in lipid biology."
  46. LipidBank "The official database of Japanese Conference on the Biochemistry of Lipids"
  47. Crystallography Open Database "Open-access collection of crystal structures of organic, inorganic, metal-organic compounds and minerals, excluding biopolymers"
  48. P450 Drug Interaction Table
  49. Kinase Knowledgebase "... Eidogen-Sertanty's database of kinase structure-activity and chemical synthesis data. The KKB represents the first gene-family wide implementation of our proprietary web-based technology for the capture, curation, and display of biological activity and chemical synthesis data from scientific literature and patents."
  50. MatWeb "MatWeb's searchable database of material properties includes data sheets of thermoplastic and thermoset polymers such as ABS, nylon, polycarbonate, polyester, polyethylene and polypropylene; metals such as aluminum, cobalt, copper, lead, magnesium, nickel, steel, superalloys, titanium and zinc alloys; ceramics; plus semiconductors, fibers, and other engineering materials."
  51. MetaCyc "MetaCyc is a database of nonredundant, experimentally elucidated metabolic pathways. MetaCyc contains more than 1747 pathways from more than 2170 different organisms, and is curated from the scientific experimental literature."
  52. METLIN "METLIN is a metabolite database for metabolomics containing over 42,000 structures, it also represents a data management system designed to assist in a broad array of metabolite research and metabolite identification by providing public access to its repository of current and comprehensive MS/MS metabolite data. An annotated list of known metabolites and their mass, chemical formula, and structure are available on the METLIN website."
  53. National Drug Code Registry "The Drug Listing Act of 1972 requires registered drug establishments to provide the Food and Drug Administration (FDA) with a current list of all drugs manufactured, prepared, propagated, compounded, or processed by it for commercial distribution. ... Drug products are identified and reported using a unique, three-segment number, called the National Drug Code (NDC), which serves as a universal product identifier for human drugs. FDA publishes the listed NDC numbers and the information submitted as part of the listing information in the NDC Directory ..."
  54. Protein Data Bank "The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids"
  55. FDA Unique Ingredient Identifier "The overall purpose of the joint FDA/USP Substance Registration System (SRS) is to support health information technology initiatives by generating unique ingredient identifiers (UNIIs) for substances in drugs, biologics, foods, and devices. The UNII is a non- proprietary, free, unique, unambiguous, non semantic, alphanumeric identifier based on a substance’s molecular structure and/or descriptive information."
  56. Psychoactive Drug Screening Program Database "This service provides screening of novel psychoactive compounds for pharmacological and functional activity at cloned human or rodent CNS receptors, channels, and transporters."
  57. Drugable.com "... an NLM stimulus-funded resource that maintains a comprehensive index of druggable target, cheminformatics, druglike chemistry, experimental activity, crystallographic structure, and in silico docking data."
  58. MassBank "... the first public repository of mass spectral data for sharing them among scientific research community. MassBank data are useful for the chemical identification and structure elucidation of chemical comounds detected by mass spectrometry."
  59. RRuff "... creating a complete set of high quality spectral data from well characterized minerals and is developing the technology to share this information with the world. Our collected data provides a standard for mineralogists, geoscientists, gemologists and the general public for the identification of minerals both on earth and for planetary exploration. ..."
  60. MolPort "... 9 million different chemical compounds ready to order on one internet resource ..."
  61. Golm Metabolome Database "... facilitates the search for and dissemination of mass spectra from biologically active metabolites quantified using Gas chromatography (GC) coupled to mass spectrometry (MS)"
  62. Madison Metabolomics Consortium Database "Each metabolite entry in the MMCD is supported by information in an average of 50 separate data fields, which provide the chemical formula, names and synonyms, structure, physical and chemical properties, NMR and MS data on pure compounds under defined conditions where available, NMR chemical shifts determined by empirical and/or theoretical approaches, calculated isotopomer masses, information on the presence of the metabolite in different biological species, and extensive links to images, references, and other public databases ..."
  63. ChemSpider Synthetic Pages "ChemSpider SyntheticPages is a freely available interactive database of synthetic chemistry. We publish practical and reliable organic, organometallic and inorganic chemical synthesis, reactions and procedures deposited by synthetic chemists. Synthetic methods on the site are updated continuously by chemists working in academic and industrial research laboratories."
  64. ChemSpider InChI Resolver "The InChI Resolver provides online access to a series of tools supporting the generation and look-up of InChIStrings and InChIKeys."