Re: incremental backups

Rick Gigger <rick@xxxxxxxxxxxxxxxxxxxx> · Fri, 27 Jan 2006 10:18:57 -0700

I guess my email wasn't all that clear.  I will try to rephrase.  I  
am moving from using the old style pg_dump for backups to using  
incrementals and want to make sure I understand the process before I  
go about writing a bunch of scritps.

To me setting up incremental backup consists of the following  
components:

1) Setting up the WAL archiving.  This one is trivial.
2) Doing physical dumps of the $PGDATA directory.  This one is once  
again trivial.
3) Knowing which physical dumps are Good and Not Good.  For a given  
physical dump D there is are WAL archive files Dstart and Dend for  
which you much have Dstart and Dend and all files in between.  If you  
have all those files then the physical dump is Good.  If you don't  
have them then the dump is worthless to you.
4) Knowing which dumps and which archive files can be deleted.  This  
depends on a number of factors.
	a) How far back do you want to be able to do PITR
	b) How much space do you have / want to use for PITR
	c) Which physical dumps are Good and which are Not Good. (see #3)

Now I think I have a pretty good plan here except for #3 (and so #4  
then also suffers).

Just as an example lets say I'm not concerned so much with PITR as I  
am recovering from a db crash. I've got all the backups files saved  
to my backup db server so I can failover to it if my primary db  
server dies.  I just want to make sure I've got one physical dump  
that is good.  (This is not my actual situation but it illustrated my  
point better.)

Now when I do a physical dump it is not a Good dump.  That is I don't  
have the end archive file necessary to recover from that physical  
dump.  That is to say that  when I call pg_backup_start() then copy  
$PGDATA then call pg_backup_stop() postgres might be on say WAL  
archive file #5.  Once the physical dump is completed WAL archive  
file #5 hasn't been archived yet.  I only have up to #4.  So if I  
delete my old physical dumps and all I've got is this most recent one  
and my database crashes before #5 gets archived then I am hosed.  I  
have no good physical backups to start from.

My main question is about the best way to figure out when a physical  
dump is Good.

One strategy is to always keep around lots of physical dumps.  If you  
keep around 100 dumps you can be pretty sure that in the space of  
time that those physical dumps take place that at least one WAL file  
was archived.  In fact if you keep 2 physical dumps you can be fairly  
certain of this.  If not then you really need to space our your dumps  
more.

Is this making sense at this point?

The problem is that the WAL archiving is triggered by postgres and  
the rate at which the db is updated.  The physical dumps are  
triggered by cron and on a purely time based schedule.  So in theory  
if you had the physical dumps happening once a day but for some odd  
reason no one updated the database for 4 days then all of a sudden  
you'd have 2 physical backups and neither of them are good.  If  
you're db crashes during that time you are hosed.

Maybe I am arguing a point that is just stupid because this will  
never happen in real life.  But in that it is my backups system that  
I will be using to recover from complete and total disaster I just  
want to have all my bases covered.

So my ideas on how to determine if a physical dump is Good are as  
follows.

1) When you do the physical backup (after dumping the $PGDATA dir but  
before calling pg_stop_backup() ) determine the current WAL archive  
file.  Mark somewhere in the backed up physical dump the last file  
needed for the dump to be considered good.  Then your deletion  
scripts can look at the WAL archive files you have and the last one  
required for the dump to be Good and determine if the dump is Good or  
not.

2) After doing the physical dump but before calling pg_stop_backup()  
just copy the current WAL file to the physical dump.  If that file  
later gets archived then the restore commands overwrites your  
partially completed one so it doesn't hurt but you know that when you  
call pg_stop_backup() that that physical dump is good.  (Is it ok to  
copy the current WAL file while it is still in use?)

Is anyone taking one of these or any other precautions to make sure  
they've got a good physical dump or does everyone just keep a whole  
bunch of dumps around, and then actually restore the dump to see if  
it is good and if not go back to a previous dump?

I hope that makes more sense.

Thanks,

Rick

On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:

Rick Gigger wrote:
Um, no you didn't read my email at all.  I am aware of all of that  
and it is clearly outlined in the docs.  My email was about a  
specific detail in the process.  Please read it if you want to  
know what my actual question was.

I'm not sure your email is quite right as regards the process. You  
need:
  1. the filesystem backup
  2. the WAL file indicated in the history-file
  3. all the WAL files later than that
to get up to "now".

If you don't want to replay up to "now" then you will not need some  
of the more recent WAL files. You can't afford to throw them away  
though since you've got a rolling backup system running and the  
whole point is so you can recover to any point you like.

You can however throw away any WAL files older than that indicated  
in the history file for your current filesystem-backup. You can  
then only restore from that point in time forward.

There is no "last one" in the WAL set unless you know the time you  
want to restore to. Indeed, the "last one" might not be "full" yet  
and therefore archived if you want to restore to 10 seconds ago.

Or am I mis-understanding your email too?

--
  Richard Huxton
  Archonet Ltd