On 3/17/20 3:22 PM, Justin King wrote:
Apologies, I accidentally sent this to the pgsql-admin list initially but intended it go here: We have a database that isn't overly large (~20G), but gets incredibly frequent updates. Here's an example table: feedi=# select * from pg_stat_all_tables where schemaname = 'production' and relname = 'tita'; relid = 16786 schemaname = production relname = tita seq_scan = 23365 seq_tup_read = 403862091 idx_scan = 26612759248 idx_tup_fetch = 19415752701 n_tup_ins = 24608806 n_tup_upd = 4207076934 n_tup_del = 24566916 n_tup_hot_upd = 4073821079 n_live_tup = 79942 n_dead_tup = 71969 n_mod_since_analyze = 12020 last_vacuum = 2020-03-17 15:35:19.588859+00 last_autovacuum = 2020-03-17 21:31:08.248598+00 last_analyze = 2020-03-17 15:35:20.372875+00 last_autoanalyze = 2020-03-17 22:04:41.76743+00 vacuum_count = 9 autovacuum_count = 135693 analyze_count = 9 autoanalyze_count = 495877 As you can see in this table, there are only ~80K rows, but billions of updates. What we have observed is that the frozenxid reaches the 200M mark fairly quickly because of the amount of activity. What is interesting is that this happens with the 'postgres' and 'template1' databases as well and there is absolutely no activity in those databases. When the 'postgres' and/or 'template1' databases hit the freeze_max_age, there are cases where it kicks off an aggressive autovac of those tables which seems to prevent autovacs from running elsewhere. Oddly, this is not consistent, but that condition seems to be required. We have observed this across multiple PG12 servers (dev, test, staging, production) all with similar workloads.
Is there anything in postgres and template1 besides what was created at init?
What are your settings for autovacuum?: https://www.postgresql.org/docs/12/runtime-config-autovacuum.html
$ grep -i vacuum /var/log/postgresql/postgres.log | cut -b 1-9 | uniq -c 17 Mar 17 06 34 Mar 17 07 31 Mar 17 08 31 Mar 17 09 30 Mar 17 10 34 Mar 17 11 33 Mar 17 12 19 Mar 17 13 40 Mar 17 15 31 Mar 17 16 36 Mar 17 17 34 Mar 17 18 35 Mar 17 19 35 Mar 17 20 33 Mar 17 21 As you can see above, we end up having around ~33 autovac/hr, and about 13:30 today, they stopped until we ran a "vacuum freeze verbose analyze;" against the 'postgres' database (around 15:30) which then caused the autovacs to resume running against the "feedi" database. I'm completely perplexed as to what is happening and why it suddenly started when we moved from PG10 > PG12. The configs and workload are essentially the same between versions. We realize we could simply increase the autovacuum_freeze_max_age, but that doesn't seem to actually resolve anything -- it just pushes the problem out. Has anyone seen anything similar to this? Thanks very much for the consideration. Justin King http://flightaware.com/
-- Adrian Klaver adrian.klaver@xxxxxxxxxxx