Greg Smith wrote:
The way you're grabbing
files directly from the xlog directory only works because your commit
workload is so trivial that you can get away with it, and because you
haven't then tried to apply future archive logs.
Well, it's only because I don't need future logs, just like I don't need
"future" files. Backup is at 2:00 AM, any change after that is
potentially lost. That includes e-mails, web contents, and database
contents. The database contents are in no way different to us.
It's the "your commit workload is so trivial that you can get away with
it" I don't really get, but more on this later.
In the general case,
circumventing the archiving when the backup is going on won't guarantee
everything is ordered just right for PITR to work correctly.
Generic PITR? You mean if backup is at 2:00 AM and the server crashes
(all disks lost) at 2:00 PM, you want to be able to recover to some
time like 11:00 AM, and be precise about it? That's PITR to me - and the
"precise" part is key here... either the time or the transaction ID
would do, the point is being able to draw a line and say "anything
before this is correct".
Well if that's what you mean by PITR, I never claimed my method would
give you that ability. I'm pretty aware it won't do, in the general
case. If you need that, you need to archive all the logs created after
the backup, that's pretty obvious.
But even under heavy write load, my method works, if the only point in
time you want to be able to recover is 2:00AM.
It works for you too, it gives you nice working backup. If you also need
real PITR, your archive_commmand is going to be something like:
archive_command = 'test ! -f /var/lib/pgsql/backup_lock && cp %p
/my_archive_dir/%f'
I consider
what you're doing a bad idea that you happen to be comfortable with the
ramifications of, and given the circumstances I understand how you have
ended up with that solution.
I would highly recommend you consider switching at some point to the
solution Simon threw out:
create table xlog_switch as
select '0123456789ABCDE' from generate_series(1,1000000);
drop table xlog_switch;
Ok, now the segment gets rotated, and a copy of the file appears
somewhere. What's the difference in having the archive_command store it
or your backup procedure store it?
Let's say my archive_command it's a cp to another directory, and let's
say step 5) is a cp too. What exaclty buys me to force a segment switch
with dummy data instead of doing a cp myself on the real segment data?
I mean, both ways would do.
you should reconsider doing your PITR backup
properly--where you never touch anything in the xlog directory and
instead only work with what the archive_command is told.
Well, I'm copying files. That's exaclty what a typical archive_command
does. It's no special in any way, just a cp (or tar or rsync or
whatever). Unless you mean I'm not supposed to copy a partially filled
segment. There can be only one, the others would be full ones, and full
ones are no problem. I think PG correctly handles the partial one if I
drop it in pg_xlog at recover time.
That segment you need to treat specially at recover time, if you use my
procedure (in my case, I don't). If you have a later copy if it (most
likely an archived one), you have to make it avalable to PG instead of
the old one, if you want to make use of the rest of the archived
segments. If you don't want to care about this, then I agree your method
of forcing a segment switch is simpler. There's not partial segment at
all. Anyway, it's running a "psql -c" at backup time vs. a "test -nt &&
rm" at restore time, not a big deal in either case.
.TM.