Re: Reliable WAL file shipping over unreliable network

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Just use "-ac”;  you want -c option to ensure no data corruption during the transfer.  Do not delete the file; let Postgres manage that.

Here is a snippet from I script I use for archiving.  You also want to make your script returns failure or success correctly.  

# SSH Command and options
SSH_CMD="ssh -o ServerAliveInterval=20 $ARCH_SERVER"
STS=3

OUTPUT=$(rsync -ac --rsync-path="mkdir -p $ARCH_DIR && rsync" $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)
if [ $? == 0 ]; then 
   STS=0
fi

exit $STS

Thanks for the script. So I need to use this on the master side in archive_command. It ensures that postgres will retry to transfer partially transferred files until it succeeds. Unfortunately, I cannot use this in my docker container which is isolated and does not contain SSH or rsync. But I get the idea and I can come up with a simple script that runs on the host machine and transfers these files reliably to the slave side.

But I still don't understand what happens on the slave side when the slave tries to use a partially transferred WAL file. I have this recovery.conf on the slave side:

standby_mode='on'
primary_conninfo='host=postgres-master port=5432 user=not_telling password=not_telling'
trigger_file='/backup/trigger'
restore_command = 'cp /transfer/%f %p'
archive_cleanup_command = 'pg_archivecleanup /transfer %r'

So what happens when the slave postgres executes restore_command on a WAL file that was transferred partially? It cannot test the file for completeness before it was already copied to pg_wal. The restore command itself also cannot tell if the file is complete or not. What does PostgreSQL do when it sees an incomplete file in pg_wal? Detects that the file was incomplete, and tries to execute the restore_command again? When? How often?

It seems inefficient to execute the restore_command multiple times, just to find out that the file was not yet complete, but if this is how it works, then it is fine with me. But does it really work that way? I don't see it documented (or maybe I'm looking at the wrong place.)

Sorry for the many questions. I need to understand every detail of this before we do it in production.


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux