Maximum xid increasing

Craig Milhiser <craig@xxxxxxxxxxxx> · Wed, 8 Nov 2023 09:09:01 -0500

Our max xid are over 500m and climbing. Auto vacuuming and manual vacuuming are returning
autovacuuming to prevent wraparound.
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

The autovacuum previously kicked in at 200m and reduced the maximum xid.   Now there is no reduction.

We have a writer (master) node, 3 readers (slaves) and a hot-standby. We have an ETL vendor hooked up to replicate. All of the replications are working. There is current data in each. We have checked our replication slots. Everything is in use and working.

From transactions, we have terminated every writer node PID since before this climb has started. Almost every client has been restarted. 

We are handling 100+ insert/updates per second. Most transactions complete in ~2 seconds. Though some may take 1 minute.  There are no long running idle in transaction activities. 

We have been scouring the internet reading many helpful pages and trying their advice but we cannot find a problem.

We have restored a daily snapshot in another cluster. After vacuuming the max xid dropped to 50m. No client was connected. Implying that a restart should fix the issue.

I am searching for any hints. We have not found a smoking gun to focus on.  This is a 24x7 operation and would rather not have to restart the server.

We are on v13.8, AWS RDS.

Is there a way to find which row or table or lock or some open transaction is preventing the vacuuming from lowering xid? Any hints on where to look? 

Thanks