Re: Fwd: PG12 autovac issues

Justin King <kingpin867@xxxxxxxxx> · Tue, 17 Mar 2020 17:48:24 -0500

On Tue, Mar 17, 2020 at 5:39 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
>
> On 3/17/20 3:22 PM, Justin King wrote:
> > Apologies, I accidentally sent this to the pgsql-admin list initially
> > but intended it go here:
> >
> > We have a database that isn't overly large (~20G), but gets incredibly
> > frequent updates.  Here's an example table:
> >
> > feedi=# select * from pg_stat_all_tables where schemaname =
> > 'production' and relname = 'tita';
> > relid = 16786
> > schemaname = production
> > relname = tita
> > seq_scan = 23365
> > seq_tup_read = 403862091
> > idx_scan = 26612759248
> > idx_tup_fetch = 19415752701
> > n_tup_ins = 24608806
> > n_tup_upd = 4207076934
> > n_tup_del = 24566916
> > n_tup_hot_upd = 4073821079
> > n_live_tup = 79942
> > n_dead_tup = 71969
> > n_mod_since_analyze = 12020
> > last_vacuum = 2020-03-17 15:35:19.588859+00
> > last_autovacuum = 2020-03-17 21:31:08.248598+00
> > last_analyze = 2020-03-17 15:35:20.372875+00
> > last_autoanalyze = 2020-03-17 22:04:41.76743+00
> > vacuum_count = 9
> > autovacuum_count = 135693
> > analyze_count = 9
> > autoanalyze_count = 495877
> >
> > As you can see in this table, there are only ~80K rows, but billions
> > of updates.  What we have observed is that the frozenxid reaches the
> > 200M mark fairly quickly because of the amount of activity.  What is
> > interesting is that this happens with the 'postgres' and 'template1'
> > databases as well and there is absolutely no activity in those
> > databases.
> >
> > When the 'postgres' and/or 'template1' databases hit the
> > freeze_max_age, there are cases where it kicks off an aggressive
> > autovac of those tables which seems to prevent autovacs from running
> > elsewhere.  Oddly, this is not consistent, but that condition seems to
> > be required.  We have observed this across multiple PG12 servers (dev,
> > test, staging, production) all with similar workloads.
>
> Is there anything in postgres and template1 besides what was created at
> init?

There is nothing in there at all besides system tables created at init.

>
> What are your settings for autovacuum?:
>
> https://www.postgresql.org/docs/12/runtime-config-autovacuum.html

Here are the settings, these are the only ones that are not set to
default with the exception of a few tables that have been overridden
with a different value due to lots of updates and few rows:

autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 8
autovacuum_naptime = 15s
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 2500
vacuum_cost_limit = 1000

We want fairly aggressive autovacs to keep table bloat limited -- the
application latency suffers if it has to wade through dead tuples and
staying near realtime is important in our environment.

** Also, it should be noted that the autovacuum_analyze_threshold is
probably an incorrect value, we likely intended that to be 250 and
just have now realized it after poking more at the configuration.

>
> >
> > $ grep -i vacuum /var/log/postgresql/postgres.log | cut -b 1-9 | uniq -c
> >       17 Mar 17 06
> >       34 Mar 17 07
> >       31 Mar 17 08
> >       31 Mar 17 09
> >       30 Mar 17 10
> >       34 Mar 17 11
> >       33 Mar 17 12
> >       19 Mar 17 13
> >       40 Mar 17 15
> >       31 Mar 17 16
> >       36 Mar 17 17
> >       34 Mar 17 18
> >       35 Mar 17 19
> >       35 Mar 17 20
> >       33 Mar 17 21
> >
> > As you can see above, we end up having around ~33 autovac/hr, and
> > about 13:30 today, they stopped until we ran a "vacuum freeze verbose
> > analyze;" against the 'postgres' database (around 15:30) which then
> > caused the autovacs to resume running against the "feedi" database.
> >
> > I'm completely perplexed as to what is happening and why it suddenly
> > started when we moved from PG10 > PG12.  The configs and workload are
> > essentially the same between versions.  We realize we could simply
> > increase the autovacuum_freeze_max_age, but that doesn't seem to
> > actually resolve anything -- it just pushes the problem out.  Has
> > anyone seen anything similar to this?
> >
> > Thanks very much for the consideration.
> >
> > Justin King
> > http://flightaware.com/
> >
> >
>
>
> --
> Adrian Klaver
> adrian.klaver@xxxxxxxxxxx