I ran into an issue the other day on one of my database clusters
where it appears a replica had a transaction open for almost 24
hours. This was causing stale data on the replica (as observed by
queries against the db), as well as tuple bloat on the master (due
to `hot_standby_feedback=on` and vacuum not cleaning them up). The part that is confusing me is that I have `max_standby_streaming_delay=30s`, and from all the documentation I can find, this should have resulted in the transaction being killed. The replica has since been restarted, so I cannot gather any information about current state. However I have monitoring and logging on many of the postgres stats, so I might be able to retrieve some information if requested. I have also tried reproducing the issue in a lab, but have so far been unable to. Are there any scenarios where a replica might still delay replaying the transaction log past the value configured in `max_standby_streaming_delay`? For version information, I am running PostgreSQL 9.6.3 on CentOS/7 -Patrick |