On Mar 19, 2014 5:08 PM, "Venkata Balaji Nagothi" <vbnpgc@xxxxxxxxx> wrote:
>
>
> On Thu, Mar 20, 2014 at 4:12 AM, Shaun Duncan <shaun.duncan@xxxxxxxxx> wrote:
>
>> This is on a production 9.0.15 install with 1 master, 5 hot standby read only slaves.
>>
>> We've been trying to look into a situation where we're seeing that hot standby read slaves are receiving WAL segments, but are exceeding max_standby_archive_delay (60s) and max_standby_streaming_delay (60s) and not applying changes. The slave will get the first segment and hang (we've seen up to 30m before removing the slave from our read pool to catch up) and get further and further behind the master. Furthermore, we have seen that after a slave has caught up, putting it back into the read pool will mean will almost immediately start to see this happen again. It acts as if we had max_standby_* set to -1.
>>
>> I'm just looking for some ideas, hints, or suggestions as to what might be going on here or what we might be doing wrong.
>>
>
> Have you noticed if the read requests on standby database are taking long and loading up the server ? If yes, any idea why the queries on standby are taking long to complete ?
>
> WAL replay would be paused ( until the read requests are served ) at the time of conflicting queries on the standby site.
>
> Also, Do you see any delay in master sending the WALs to standby ?
>
>
> Venkata Balaji N
>
> Sr. Database Administrator
> Fujitsu Australia
>
>
>
>Yes we have seen queries against the read slave degrade severely while this happens. We have at least optimized some inefficient queries. But I would think (correct me if I am wrong) that queries would be canceled after WAL replay has exceeded the max delay. This is something we haven't seen. It behaves as if it is waiting for reads to complete.
- When conflicting queries (like DROP TABLE, DROP DATABASE etc..) are executed on master whilst standby is serving the read requests. This behavior can be controlled to some extent by setting max_standby_*_delay parameters (60s in your case). If there are any read requests in process at standby database,the conflicting queries would wait for 60s - if the read requests (Generally SELECTs) do not complete, then they are forcibly cancelled
- When master is waiting to apply VACUUM related changes (removing dead rows) at the standby site. This behavior can be controlled to some extent by "vacuum_defer_cleanup_age". This implies number of transactions executed on the master database.