Primary is 9.2.18, standby is 9.2.10. I know, I know. We're upgrading both to 9.2.22 in the next maintenance window. Standby is configured for streaming replication with the primary_conninfo parameter set. The primary has a bash script for archive_command that copies the WAL file to NFS and then rsync's the file to the replicas.
After some index rebuilds and manual vaccum today, the standby started lagging by a couple of hours. After some time I noticed that the WAL files being rsync'ed were still way behind where the standby was recovered to. For example, pg_last_xact_replay_timestamp would be at 2PM but the WAL files being rsynced hadn't even made it past noon yet.
So that's when I did remember that this is streaming replication, so the standby is getting the xlog records directly from the primary. The WAL files themselves aren't used at all by the standby, but the standby was cleaning them up, presumably because it saw that the data in them was applied hours ago.
So I'm writing just to get a confirmation or correction of my understanding of the process here, as well as some suggestions. Here's my first swing:
- Transaction completes on primary, xlog record is created. When the standby needs it, it pulls it from the primary (using the primary_conninfo credentials).
- When a WAL file is archived, our own script runs an rsync command to push it to the standby.
- The standby will recover from the streaming xlog data if local WAL files are not available. (So the standby prefers local WAL files to streaming?)
- If WAL files are available in an archive dir, restore_command is used to restore those files to the pg_xlog directory and the archive file removed.
- When a WAL file is no longer needed for recovery, the standby will delete them from the pg_xlog directory.
Assuming I'm right so far, I have these questions:
- I saw that the WAL archives were being removed from the archive directory. If the standby was already ahead of the WAL files because of streaming replication, would it still use the restore_command to restore WAL files from archive to px_log directory? Or would it just delete the archive files?
- As far as the lag goes, what can we do to increase the throughput for streaming replication? We tried to give it almost all of our 100Mb pipe between data centers and there was still significant lag until it recovered past the point of my manual vacuums.
- The archive_command on the primary seemed to run serial and wait, since it took 4-5 seconds to rsync each 16Mb WAL file from primary to standby. Is there any way to parallelize this? I guess we could just have archive_command do a local copy and then have a separate script in cron to handle some multiple rsync calls (with a max concurrency, of course)?
- I'm going to assume that having both streaming replication and WAL rsyncing are going to compete for bandwidth. Would it be advisable to only do streaming most of the day and then maybe rsync WAL files in a nightly batch? Now that I think of it, we do hourly snapmirrors of that WAL storage already (also using the same bandwidth), perhaps we just use that for the WAL replication.
I'd love to know how others might be handling this. If there's any important information that I've left out, please let me know.
Don.
--
Don Seiler
www.seiler.us
www.seiler.us