Hard link / rsync backup strategy successful

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Wed, 08 Jul 2009 17:17:51 -0500

As a follow-up to this:

http://archives.postgresql.org/pgsql-admin/2009-03/msg00233.php

The strategy described in the above post has worked out very well for
us.  If you do backups across a relatively slow link, and significant
portions of your database remain relatively stable, you might want to
consider this approach.

Attached is a graph which shows the space used on the volume
containing the database and its backups for a county with fast growth
in database size due to aggressive document scanning.  Notice the
weekly spikes -- these represent a new cpio|gzip PITR base backup
copied to the local volume.  A crontab script would check for
completion of its copy to another local server and to the remote
server; when both were successful, the *prior* base backup and the WAL
files needed for it would be deleted.  You can see how the spikes get
wider as the portion of the week required to get the backup across the
WAN expanded with size.

After we implemented the new techniques at the end of May, there is a
slightly higher base, because of a full, non-compressed copy of the
live database, but the backup times, and the data which needs to be
moved both are drastically reduced.  (The shorter, narrower spikes
give a pretty good idea of the improvement we've seen for this
county.)

One minor point -- we found that to minimize traffic, it was important
to freeze tuples pretty aggressively; otherwise the large insert-only
tables sent the data across twice, nearly doubling the bandwidth
required for backups.

-Kevin

Attachment:
backup-changes.png

Description: Portable Network Graphics Format
-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin