On Sat, Dec 3, 2022 at 9:34 PM Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
Hot_standby_feedback=on is supposed to prevent this type of conflict, not cause it. I don't know what corner case you might be hitting where it is failing to prevent the conflict. But in any case it is hard to see how turning it off is going to make the conflicts better. On the other hand, you could argue that since it is failing to fix the problem, then you might as well turn it off--it can cause bloat on the master, and while that is not the problem you are complaining about, why risk bloat if you aren't getting a benefit?
Basically my thinking as well.
in addition to increasing the delays, or even disabling the delays by setting them to -1. With replication slots in use, I want to make sure that WAL retention doesn't fill up the WAL volume, would it make sense to not use a slot for this replica (and/or not use streaming replication)?I don't see why either toggling hot_standby_feedback or lengthening the delay would cause the WAL to fill up. I guess if the datafiles get bloated, they could squeeze the space used for pg_wal, but if they are on different volumes that shouldn't happen. And I guess lengthening the max delay could cause the WAL volume to fill up on the replica, which could then back up through the slot to fill up the WAL volume on the master. But unless you are already very close to the edge, I don't think lengthening the delay by 10 minutes would cause a problem.
In the paranoid scenario I've envisioned, the longer delay would cause more WAL to be retained on the primary with replication slots in use, filling up the volume, etc. (exactly as you suggested as well). And yes we're only talking 10 minutes.
Since you already have archiving set up, you could configure your standbys to go fetch WAL from the archive should they need to, in which case you should be able to dispense with the slots without problem.
They are already configured to fall back to WAL restore/recovery from pgBackrest repo if streaming breaks, so I'm very comfortable there. In fact all of our replicas are configured to do this, so maybe disabling slots across the board might not out of order. Although we do use them also as a form of monitoring (ie if there is an inactive replication slot, then we know a replica is not streaming as expected/desired).