# Mirroring PubChem the Easy Way with PubChem Fu

If you want to work with the PubChem dataset, two of your biggest problems will likely be creating a local working copy and keeping it synchronized. Although a few articles on doing just this have appeared here on Depth-First, the information is a bit scattered. Also, some time and effort is required in getting a robust mirroring system in place and working as expected. Enter PubChem Fu, a simple tool designed to help you maintain a complete, up-to-date copy of the PubChem dataset.

## What it Does

PubChem Fu lets you create and automatically update a local working copy of all PubChem Compound and Substance data.

This tool uses Ruby and a clever little Ruby utility called Whenever to configure cron for you. Simply give it a time to pull daily updates at, and you can always have access to the complete, latest PubChem dataset.

## Usage

Create a full copy of PubChem (in the same directory as PubChem Fu):

rake full

Pull available daily updates (in the same directory as PubChem Fu):

rake daily

Automate pulling daily updates:

whenever --update-crontab pubchem

Make sure your cron task is set:

crontab -l

...

# Begin Whenever generated tasks for: pubchem
PATH= ...

15 14 * * * cd ~/local/pubchem && RAILS_ENV=production /usr/bin/env rake daily

# End Whenever generated tasks for: pubchem

That part about RAILS_ENV is a by-product of Whenever being mainly used in a Ruby on Rails environment. I'm sure there's a way to block that output if needed. If you know, please drop me a line.