Re: Fwd: PG12 autovac issues

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Tue, 17 Mar 2020 16:16:31 -0700

On 3/17/20 3:48 PM, Justin King wrote:
On Tue, Mar 17, 2020 at 5:39 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:

On 3/17/20 3:22 PM, Justin King wrote:
Apologies, I accidentally sent this to the pgsql-admin list initially
but intended it go here:

We have a database that isn't overly large (~20G), but gets incredibly
frequent updates.  Here's an example table:

feedi=# select * from pg_stat_all_tables where schemaname =
'production' and relname = 'tita';
relid = 16786
schemaname = production
relname = tita
seq_scan = 23365
seq_tup_read = 403862091
idx_scan = 26612759248
idx_tup_fetch = 19415752701
n_tup_ins = 24608806
n_tup_upd = 4207076934
n_tup_del = 24566916
n_tup_hot_upd = 4073821079
n_live_tup = 79942
n_dead_tup = 71969
n_mod_since_analyze = 12020
last_vacuum = 2020-03-17 15:35:19.588859+00
last_autovacuum = 2020-03-17 21:31:08.248598+00
last_analyze = 2020-03-17 15:35:20.372875+00
last_autoanalyze = 2020-03-17 22:04:41.76743+00
vacuum_count = 9
autovacuum_count = 135693
analyze_count = 9
autoanalyze_count = 495877

As you can see in this table, there are only ~80K rows, but billions
of updates.  What we have observed is that the frozenxid reaches the
200M mark fairly quickly because of the amount of activity.  What is
interesting is that this happens with the 'postgres' and 'template1'
databases as well and there is absolutely no activity in those
databases.

When the 'postgres' and/or 'template1' databases hit the
freeze_max_age, there are cases where it kicks off an aggressive
autovac of those tables which seems to prevent autovacs from running
elsewhere.  Oddly, this is not consistent, but that condition seems to
be required.  We have observed this across multiple PG12 servers (dev,
test, staging, production) all with similar workloads.

Is there anything in postgres and template1 besides what was created at
init?

There is nothing in there at all besides system tables created at init.

What are your settings for autovacuum?:

https://www.postgresql.org/docs/12/runtime-config-autovacuum.html

Here are the settings, these are the only ones that are not set to
default with the exception of a few tables that have been overridden
with a different value due to lots of updates and few rows:

And those values are?

More below.

autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 8
autovacuum_naptime = 15s
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 2500
vacuum_cost_limit = 1000

Are either of the below set > 0?:

vacuum_cost_delay

autovacuum_vacuum_cost_delay

We want fairly aggressive autovacs to keep table bloat limited -- the
application latency suffers if it has to wade through dead tuples and
staying near realtime is important in our environment.

** Also, it should be noted that the autovacuum_analyze_threshold is
probably an incorrect value, we likely intended that to be 250 and
just have now realized it after poking more at the configuration.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx