On Thu, Dec 1, 2022 at 1:21 PM Don Seiler <don@xxxxxxxxx> wrote:
Good afternoon.PG 12.12, PGDG ubuntu 18.04 LTS build.We have a primary with a couple of physical/streaming replicas. One intended for quick lookups and another intended for long-running reports (aka "the reporting replica"). There's also a DR replica in another region but we don't touch that for obvious reasons. All streaming replicas currently make use of replication slots on the primary as well. We have WAL archiving to cloud storage and restore/recovery handled via pgBackrest.I'm running into the age-old dilemma on the reporting replica where queries are being killed with the "canceling statement due to conflict with recovery" error. We have hot_standby_feedback=on and both max_standby_streaming_delay and max_standby_archive_delay are set to 5 minutes (300s). I'm fine increasing those delay parameters to 15 minutes or more. Like I said this replica is dedicated to long-running reports like month-end reports that you'd see today on the first of the month.
I searched around and I see some suggestions like disabling hot_standby_feedback on replicas like this to minimize impact on the primary,
Hot_standby_feedback=on is supposed to prevent this type of conflict, not cause it. I don't know what corner case you might be hitting where it is failing to prevent the conflict. But in any case it is hard to see how turning it off is going to make the conflicts better. On the other hand, you could argue that since it is failing to fix the problem, then you might as well turn it off--it can cause bloat on the master, and while that is not the problem you are complaining about, why risk bloat if you aren't getting a benefit?
in addition to increasing the delays, or even disabling the delays by setting them to -1. With replication slots in use, I want to make sure that WAL retention doesn't fill up the WAL volume, would it make sense to not use a slot for this replica (and/or not use streaming replication)?
I don't see why either toggling hot_standby_feedback or lengthening the delay would cause the WAL to fill up. I guess if the datafiles get bloated, they could squeeze the space used for pg_wal, but if they are on different volumes that shouldn't happen. And I guess lengthening the max delay could cause the WAL volume to fill up on the replica, which could then back up through the slot to fill up the WAL volume on the master. But unless you are already very close to the edge, I don't think lengthening the delay by 10 minutes would cause a problem.
Since you already have archiving set up, you could configure your standbys to go fetch WAL from the archive should they need to, in which case you should be able to dispense with the slots without problem.