Re: Bad recovery: no pg_xlog/RECOVERYXLOG

Stephen Frost <sfrost@xxxxxxxxxxx> · Thu, 2 Nov 2017 07:40:29 -0400

Mark,

* Mark Kirkwood (mark.kirkwood@xxxxxxxxxxxxxxx) wrote:
> On 02/11/17 11:18, Stephen Frost wrote:
> 
> >Not having
> >a way to reliably sync the WAL files copied by archive command to disk,
> >in particular, really is an issue, it's not some feature, it's a
> >requirement of a functional PG backup system.  The other requirement for
> >a functional PG backup system is a check to verify that all of the WAL
> >for a given backup has been archived safely to disk, otherwise the
> >backup is incomplete and can't be used.
> 
> Funnily enough, the original poster's scripts were attempting to
> address (at least some) of this: he was sending stuff to swift, so
> if he got a ok return code then it is *there* - that being the whole
> point of a distributed, fault tolerant object store (I do swift
> support BTW).

There's different levels of storage reliability even in swift and that
doesn't do anything to address the issue that you don't know if all of
the WAL for a given backup has actually made it to swift.

Perhaps it might be useful to also point out here that pg_basebackup is
going to exit just as soon as it's done copying the files- it's not
going to wait for the WAL to finish getting to swift before returning
'success' because you didn't ask pg_basebackup to pull the WAL in these
scripts.

What that means is that you could have everything be successful, per
your definitions, and still not have a valid backup, and then you decide
to rotate off your older backup and then there's a crash.  Guess what?
You don't have a valid backup anymore because you haven't got all of the
necessary WAL for the pg_basebackup that you did do, so you can't use
that, and you nuked your prior backup, so that's gone too.  Hopefully
you have more backups than that, but if not, because you trusted in
these scripts and the guarantees of swift, then you've just lost
everything.

> I wonder if you are seeing this discussion in the light of folk
> doing backups to unreliable storage locations (e.g: the same server,
> NFS etc etc), then sure I completely agree with what you are saying
> (these issue impact backup designs no matter what tool is used to
> write them).

That you're arguing so hard about this one specific shell script which
happens to be based on swift really doesn't convince me that
recommending shell-script based backup solutions on PG is a good idea.
Doing backups locally may not be ideal for various reasons, but at
least if you're making sure to properly fsync the data out to the RAID'd
disks, and verifying that your backups are fully fsync'd and that you've
checked to make sure you have all of the WAL for a given backup (and
that it's all fsync'd) then I'd argue that it's at least conceptually
correct.  The same goes for NFS, or sending the data to another server,
assuming they're set up properly to respect fsync.

Simply skipping the requirements to verify that you've got all of the
WAL for the backup and that you've made sure that it's all stored on
reliable storage isn't correct.

Doing proper backups of PG is *hard*.  There's a lot of things you have
to do correctly to get them to actually be consistently reliable in the
face of even single-point failures.  Having swift provide reliability
guarantees for the archived WAL, provided the shell script is perfectly
written to catch all errors and report them back to PG correctly, is
great, but it still doesn't address the other requirement of ensuring
that all WAL has actually been archived before considering a given
backup as complete, and you have to decide what level of guarantees you
want from swift and configure it appropriately.

If you want simple script-based backups, then use pg_basebackup and make
it do the WAL handling as well and then make sure that you've got your
script set up to check error codes from pg_basebackup and that you're
actually monitoring your backups.  Even then there's risks of issues
which boil down to cases where even we didn't fsync things out properly
leading to cases where WAL or files could be lost due to a crash after
pg_basebackup finishing.  Hopefully those have all been addressed now,
but it's a testiment to the difficulty of doing these things correctly.

Thanks!

Stephen
Attachment:
signature.asc

Description: Digital signature