Re: PITR Base Backup on an idle 8.1 server

Marco Colombo <pgsql@xxxxxxxxxx> · Mon, 04 Jun 2007 12:55:04 +0200

Greg Smith wrote:
The way you're grabbing 
files directly from the xlog directory only works because your commit 
workload is so trivial that you can get away with it, and because you 
haven't then tried to apply future archive logs.

Well, it's only because I don't need future logs, just like I don't need 
"future" files. Backup is at 2:00 AM, any change after that is 
potentially lost. That includes e-mails, web contents, and database 
contents. The database contents are in no way different to us.

It's the "your commit workload is so trivial that you can get away with 
it" I don't really get, but more on this later.

In the general case, 
circumventing the archiving when the backup is going on won't guarantee 
everything is ordered just right for PITR to work correctly.

Generic PITR? You mean if backup is at 2:00 AM and the server crashes 
(all disks lost) at 2:00 PM, you want to be able to recover to some 
time like 11:00 AM, and be precise about it? That's PITR to me - and the 
"precise" part is key here... either the time or the transaction ID 
would do, the point is being able to draw a line and say "anything 
before this is correct".

Well if that's what you mean by PITR, I never claimed my method would 
give you that ability. I'm pretty aware it won't do, in the general 
case. If you need that, you need to archive all the logs created after 
the backup, that's pretty obvious.

But even under heavy write load, my method works, if the only point in 
time you want to be able to recover is 2:00AM.

It works for you too, it gives you nice working backup. If you also need 
real PITR, your archive_commmand is going to be something like:

archive_command = 'test ! -f /var/lib/pgsql/backup_lock && cp %p 
/my_archive_dir/%f'

I consider 
what you're doing a bad idea that you happen to be comfortable with the 
ramifications of, and given the circumstances I understand how you have 
ended up with that solution.

I would highly recommend you consider switching at some point to the 
solution Simon threw out:

create table xlog_switch as
select '0123456789ABCDE' from generate_series(1,1000000);
drop table xlog_switch;

Ok, now the segment gets rotated, and a copy of the file appears 
somewhere. What's the difference in having the archive_command store it 
or your backup procedure store it?

Let's say my archive_command it's a cp to another directory, and let's 
say step 5) is a cp too. What exaclty buys me to force a segment switch 
with dummy data instead of doing a cp myself on the real segment data?

I mean, both ways would do.

you should reconsider doing your PITR backup
properly--where you never touch anything in the xlog directory and 
instead only work with what the archive_command is told.

Well, I'm copying files. That's exaclty what a typical archive_command 
does. It's no special in any way, just a cp (or tar or rsync or 
whatever). Unless you mean I'm not supposed to copy a partially filled 
segment. There can be only one, the others would be full ones, and full 
ones are no problem. I think PG correctly handles the partial one if I 
drop it in pg_xlog at recover time.

That segment you need to treat specially at recover time, if you use my 
procedure (in my case, I don't). If you have a later copy if it (most 
likely an archived one), you have to make it avalable to PG instead of 
the old one, if you want to make use of the rest of the archived 
segments. If you don't want to care about this, then I agree your method 
of forcing a segment switch is simpler. There's not partial segment at 
all. Anyway, it's running a "psql -c" at backup time vs. a "test -nt && 
rm" at restore time, not a big deal in either case.

.TM.