Hi, I have a very heavily updated table in a Postgres 9.6.10 database with lots of disk bloat. Every row is updated about once a minute, and little to no inserts. Approx 18k rows total. The table has bloated from ~1700 KB to about 6 GB over a few weeks time. I'm trying to understand why vacuum hasn't made dead rows available for re-use. appdb=# \d+ app.heartbeat; Table "app.heartbeat" Column | Type | Modifiers | Storage | Stats target | Description ---------------------+-----------------------------+---------------+----------+--------------+------------- endpoint | uuid | not null | plain | | record_timestamp | timestamp without time zone | not null | plain | | heartbeat_timestamp | timestamp without time zone | not null | plain | | outer_ip | character varying(46) | not null | extended | | inner_ip | character varying(46) | not null | extended | | last_notified | timestamp without time zone | default now() | plain | | Indexes: "heartbeat_pkey" PRIMARY KEY, btree (endpoint) "heartbeat_endpoint_idx" btree (endpoint) Options: fillfactor=45 I reset the stats on the tuple and a few moments later ran: SELECT pg_stat_reset_single_table_counters(40237); SELECT c.oid, c.relname, c.relfrozenxid, c.relminmxid, c.relreplident, relpages, reltuples, pg_stat_get_live_tuples(c.oid) AS n_live_tup , pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd, n_mod_since_analyze, last_analyze, last_autoanalyze, autovacuum_count, autoanalyze_count FROM pg_class c, pg_stat_all_tables av where c.oid = av.relid and c.relname = av.relname and c.relname = 'heartbeat' order by av.relname; oid | relname | relfrozenxid | relminmxid | relreplident | relpages | reltuples | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_mod_since_analyze | last_analyze | last_autoanalyze | autovacuum_count | autoanalyze_count -------+----------------+--------------+------------+--------------+----------+-----------+------------+------------+-----------+-----------+-----------+---------------+---------------------+--------------+------------------+------------------+------------------- 40237 | heartbeat | 3451305052 | 55105127 | d | 674616 | 15494 | 0 | 6234 | 0 | 6234 | 0 | 3758 | 6234 | | | 0 | 0 (1 row) I also found it interesting in n_tup_upd vs n_tup_hot_upd: HOT updates were 3758, vs 6234 overall. I was expecting all updates to be HOT since the PK isn't changing, just record values (timestamps, IPs), and my fillfactor is low. Maybe not relevant but thought it worth mentioning just in case, because both index and table is bloated. So when I try vacuum it, the vacuum reports: DETAIL: 55330122 dead row versions cannot be removed yet. INFO: "hearbeat": found 0 removable, 55347745 nonremovable row versions in 675230 out of 675230 pages How can I determine what is preventing the dead rows from being removed? Is there a way to identify what transaction if any is preventing their re-use. Or am I misinterpreting the results? Full vacuum log: appdb=# vacuum verbose app.hearbeat; INFO: vacuuming "app.hearbeat" INFO: index "hearbeat_pkey" now contains 23435126 row versions in 112094 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.38s/0.40u sec elapsed 1.07 sec. INFO: index "hearbeat_endpoint_idx" now contains 23435358 row versions in 112185 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.23s/0.35u sec elapsed 0.70 sec. INFO: "hearbeat": found 0 removable, 55347745 nonremovable row versions in 675230 out of 675230 pages DETAIL: 55330122 dead row versions cannot be removed yet. There were 199391 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 3.40s/11.46u sec elapsed 18.39 sec. VACUUM And in case it is relevant: appdb=# SELECT pid, datname, usename, state, backend_xmin appdb-# FROM pg_stat_activity appdb-# WHERE backend_xmin IS NOT NULL appdb-# ORDER BY age(backend_xmin) DESC; pid | datname | usename | state | backend_xmin -------+---------------+----------+---------------------+-------------- 10921 | appdb | app | idle in transaction | 3501305052 10919 | appdb | app | idle in transaction | 3501305052 10916 | appdb | app | idle in transaction | 3501305052 27935 | appdb | app | idle in transaction | 3501305052 24500 | appdb | postgres | active | 3501305052 10914 | appdb | app | active | 3501305052 20671 | appdb | postgres | active | 3501305052 11817 | appdb | app | active | 3501305052 1988 | appdb | app | active | 3501305052 15041 | appdb | postgres | active | 3501305052 9916 | appdb | postgres | active | 3501305052 10912 | appdb | app | idle in transaction | 3501305052 10909 | appdb | app | idle in transaction | 3501305052 (13 rows) appdb=# appdb=# SELECT slot_name, slot_type, database, xmin appdb-# FROM pg_replication_slots appdb-# ORDER BY age(xmin) DESC; slot_name | slot_type | database | xmin -----------+-----------+----------+------ (0 rows) appdb=# appdb=# SELECT gid, prepared, owner, database, transaction AS xmin appdb-# FROM pg_prepared_xacts appdb-# ORDER BY age(transaction) DESC; gid | prepared | owner | database | xmin -----+----------+-------+----------+------ (0 rows) Looking through my autovacuum logs, we are only seeing autovacuum trigger once every 20-40 minutes, which in a table that is essentially overwritten completely every 60s will explain part of the problem. So I am also looking to tweak my vacuum settings so it is more aggressive with this table. However the vacuum above not freeing up any dead rows has me worried it won't be sufficient. One other thing to add, we had a similar problem earlier in the year and a vacuum full apparently didn't clear up the disk space, which seems very odd to me but unfortunately I wasn't there to check in more detail. We ended up copying the data into a temporary table, truncating the original table, and inserting the data back into the original. Any advice is much appreciated! Kind regards, Simon