Re: Hot standby read slaves exceed max delay on WAL segment. Replication lag.

Venkata Balaji Nagothi <vbnpgc@xxxxxxxxx> · Thu, 20 Mar 2014 08:08:36 +1100

On Thu, Mar 20, 2014 at 4:12 AM, Shaun Duncan <shaun.duncan@xxxxxxxxx> wrote:

This is on a production 9.0.15 install with 1 master, 5 hot standby read only slaves.

We've been trying to look into a situation where we're seeing that hot standby read slaves are receiving WAL segments, but are exceeding max_standby_archive_delay (60s) and max_standby_streaming_delay (60s) and not applying changes. The slave will get the first segment and hang (we've seen up to 30m before removing the slave from our read pool to catch up) and get further and further behind the master. Furthermore, we have seen that after a slave has caught up, putting it back into the read pool will mean will almost immediately start to see this happen again. It acts as if we had max_standby_* set to -1.

I'm just looking for some ideas, hints, or suggestions as to what might be going on here or what we might be doing wrong.

Have you noticed if the read requests on standby database are taking long and loading up the server ? If yes, any idea why the queries on standby are taking long to complete ? 

WAL replay would be paused ( until the read requests are served ) at the time of conflicting queries on the standby site.

Also, Do you see any delay in master sending the WALs to standby ?

Venkata Balaji N

Sr. Database Administrator
Fujitsu Australia