Re: Bad recovery: no pg_xlog/RECOVERYXLOG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stephen,

On 03/11/17 00:11, Stephen Frost wrote:


Sure, that'll work much of the time, but that's about like saying that
PG could run without fsync being enabled much of the time and everything
will be ok.  Both are accurate, but hopefully you'll agree that PG
really should always be run with fsync enabled.

It is completely different - this is a 'straw man' argument, and justs serves to confuse this discussion.


Also, if what you are suggesting were actually the case, almost
everyone's streaming replication (and/or log shipping) would be
broken all the time.
No, again, this isn't an argument about if it'll work most of the time
or not, it's about if it's correct.  PG without fsync will work most of
the time too, but that doesn't mean it's actually correct.

No, it is pointing out that if your argument were correct, then there should be the above side effects - there are not, which is significant.

The crux of your argument seems to be concerning the synchronization between pg_basbackup finishing and being sure you have the required archive logs. Now just so we are all clear, when pg_basebackup ends it essentially calls do_pg_stop_backup (from xlog.c) which ensures that all required WAL files are archived, or to be precise here makes sure archive_command has been run successfully for each required WAL file.

Your entire argument seems about whether said WAL is fsync'ed to disk, and how this is impossible to ensure in a shell script. Actually it is possible quite simply: e.g suppose you archive command is:

rsync ... targetserver:/disk

There are several ways to get that to sync:

rsync .. targetserver:/disk && ssh target server sync

Alternatively amend  vm.dirty_bytes on targetserver to be < 16M, or mount the /disk with sync option!

So it is clearly *possible*.

However, I think you are obsessing over the minutiae of fsync to single server/disk when there are much more important (read likely to happen) problems to consider. For me, the critical consideration is, not 'are the WAL files there *right now*'..but 'will they be there tomorrow when I need them for a restore'? Next is 'will they be the same/undamaged when I read them tomorrow'?

This is why I'm *not* obsessing about fsyncing...make where you store these WAL files *reliable*...either via proxying/ip splitting so you send stuff to more that one server (if we are still thinking server + disk = backup solution). Alternatively use a distributed object store (Swift, S3 etc) that handle that for you, and in addition they checksum and heal any individual node data corruption for you as well.
With respect to 'If I would like to develop etc etc..' - err, all I
was doing in this thread was helping the original poster make his
stuff a bit better - I'll continue to do that.
Ignoring the basic requirements which I outlined isn't helping him get
to a reliable backup system.

Actually I was helping him get a *reliable* backup system, I think you misunderstood how swift changes the picture compared to a single server/single disk design.

regards

Mark


--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux