Mark, * Mark Kirkwood (mark.kirkwood@xxxxxxxxxxxxxxx) wrote: > On 03/11/17 00:11, Stephen Frost wrote: > >Sure, that'll work much of the time, but that's about like saying that > >PG could run without fsync being enabled much of the time and everything > >will be ok. Both are accurate, but hopefully you'll agree that PG > >really should always be run with fsync enabled. > > It is completely different - this is a 'straw man' argument, and > justs serves to confuse this discussion. I don't see it as any different at all. The point I was trying to make there is that there's a minimum requirement for backups, just as there is for ACID compliance, and any solution needs to meet that minimum to be considered. > The crux of your argument seems to be concerning the synchronization > between pg_basbackup finishing and being sure you have the required > archive logs. Now just so we are all clear, when pg_basebackup ends > it essentially calls do_pg_stop_backup (from xlog.c) which ensures > that all required WAL files are archived, or to be precise here > makes sure archive_command has been run successfully for each > required WAL file. pg_basebackup talks the replication protocol, to be clear, and sends a BASE_BACKUP message, of which one of the options is 'NOWAIT' to indicate if the server should wait until all of the WAL has been archived. Typically, pg_basebackup does send a 'NOWAIT' to tell the server to not hold up the final message until all of the WAL has been archived, because it's handling the verification of the WAL having been archived. In the unusual case that WAL isn't included with the pg_basebackup it looks like it would wait for the archive_command to complete, which is better than I had thought (and hadn't noticed on my first glance through the code), though that does depend on a functional and perfect archive_command, and there's no shortage of reasons for why that might not be the case at the time the backup is happening. That's an awful lot of action-at-a-distance hope for me to be comfortable with, however. A backup solution really does need to verify that the WAL has been completely and reliably stored, as discussed in the documentation, before claiming a backup is valid, and there's basically no reason not to unless the tool you've chosen to use makes that particularly difficult (even if not *technically* impossible, given enough effort). If your solution is built on the assumption that WAL archiving is always working and there's no check happening during backup to verify that you've got all the WAL then I have serious doubts about it being reliable. If you're independently monitoring that all WAL has been archived, that's certainly helpful, but I don't consider that to be a complete substitute for making sure that you've got all of the WAL for a given backup. > Your entire argument seems about whether said WAL is fsync'ed to > disk, and how this is impossible to ensure in a shell script. [...] > So it is clearly *possible*. Yes, it's possible, but it's not something I'd recommend doing and none of your arguments have made me any more likely to recommend trying to ensure a proper backup has completed using shell scripts. What I fail to understand is your insistence on it being a good idea. I've seen lots and lots of attempts at it, even made some myself, and have come to the generally agreed upon conclusion that it's both a bad idea to hack together your own backup solution for PG and that, even if you do want to try, using shell scripts to attempt to accomplish it is a bad idea. There's much better solutions out there which are really what folks should be using. I'm not against using pg_basebackup either, but if you're using it, let it handle the archiving because it does verify that all of the WAL has been archived properly. > Actually I was helping him get a *reliable* backup system, I think > you misunderstood how swift changes the picture compared to a single > server/single disk design. I do understand the goals of things like swift and s3 and the intent behind them to provide a better store than local disks, and I'm not against using them, to be clear, but they only address one of the requirements that I outlined for a reliable backup solution. I mention both requirements consistently to, hopefully, ensure that those coming along later to read these threads remember that it's more than just making sure that you verify all the WAL has been archived during a backup- but that they've been archived and actually fsync'd or written out to reliable storage. Thanks! Stephen
Attachment:
signature.asc
Description: Digital signature