Hi,
I am in the process of reviewing our configs for a number of 9.3 databases and found a replica with hot_standby_feedback=on. I remember when we set it long ago we were fighting cancelled queries. I also remember that it never really worked for us. In the end we set up 2 replicas, one suitable for short queries where we prefer low replication lag, and another one where we allow for long running queries but sacrifice timeliness (max_standby_*_delay=-1).
I have a hunch why hot_standby_feedback=on didn't work. But I never verified it. So, here it is. The key is this sentence:
"Feedback messages will not be sent more frequently than once per wal_receiver_status_interval."
That interval is 10 sec. So, assuming a transaction on the replica uses a row right after the message has been sent. Then there is a 10 sec window in which the master cannot know that the row is needed on the replica and can vacuum it. If then the transaction on the replica takes longer than max_standby_*_delay, the only option is to cancel it.
Is that explanation correct?
What is the correct way to use hot_standby_feedback to prevent cancellations reliably? (and accepting the bloat)
Thanks,
Torsten