Re: Standby Mechanics: WAL vs Streaming

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Aug 26, 2017 at 9:42 PM, MichaelDBA <MichaelDBA@xxxxxxxxxxx> wrote:
pgbackrest archives WALs faster than a custom archive_command command.  I use pgbackrest to automate archiving of WALs from primary to pgbackrest server.  From there, my restore_command on the slave pulls WALs if needed to catch up to SR (streaming replication).

Interesting, I'll look into pgbackrest. Thanks!
 
The standby will recover from the streaming xlog data if local WAL files are not available. (So the standby prefers local WAL files to streaming?)
The other way around: The standby prefers streaming replication.  It only pulls WALs (restore_command) if it is too far behind.

OK. That's what I had though initially but for some reason convinced myself that it would think it's easier to get it locally when available.

 
I saw that the WAL archives were being removed from the archive directory. If the standby was already ahead of the WAL files because of streaming replication, would it still use the restore_command to restore WAL files from archive to px_log directory? Or would it just delete the archive files?
The archived files in your case (stored on the standby) would only be removed if you had  pg_archivecleanup command also in your recovery.conf file for WAL cleanup (archive_cleanup_command).

Yes we do have that, sorry I didn't specify. I just wanted to confirm that it would be used to clean up archives that arrive well after their usefulness with streaming replication.

 
As far as the lag goes, what can we do to increase the throughput for streaming replication? We tried to give it almost all of our 100Mb pipe between data centers and there was still significant lag until it recovered past the point of my manual vacuums.
What do you expect?  Manual vacuum full on big tables will generate a lot of WAL thereby causing SR lag.

As I've said in other recent emails to this list, I'm relatively new to these waters, having spent the last 15+ years as an Oracle DBA. I figured that manual vacuum would generate a lot of changes for replication. That's why I'm asking where I can improve the throughput (if possible), given the additional activity spike.
 
I'm going to assume that having both streaming replication and WAL rsyncing are going to compete for bandwidth. Would it be advisable to only do streaming most of the day and then maybe rsync WAL files in a nightly batch? Now that I think of it, we do hourly snapmirrors of that WAL storage already (also using the same bandwidth), perhaps we just use that for the WAL replication.
How about doing your manual vacuums at night time during low load times?

You're assuming that night time is our off-peak time (although I did say "nightly batch", it doesn't mean it's a better time for maintenance work in our case). There's a reason I was running the vacuum when I was this morning, and it isn't something we regularly do. We had a table with 3 corrupt indexes that had prevented autovacuum from running on that table for many months. Today I recreated those indexes and ran a manual vac freeze (it was well past the min freeze age), during an off-peak time negotiated with the business side of the shop.

Thanks again for the pg_backrest information. I'll check it out. I'm thinking at the very least we could just present that NFS snapmirror as a read-only volume to the standby and point restore_command at it. That snapmirror is taking place regardless, so the additional rsync we're doing in our archive_command script on the primary is just piling on and eating up more bandwidth.

Don.
--
Don Seiler
www.seiler.us

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux