Re: PG12 autovac issues

Justin King <kingpin867@xxxxxxxxx> · Thu, 19 Mar 2020 18:07:14 -0500

On Thu, Mar 19, 2020 at 5:35 PM Andres Freund <andres@xxxxxxxxxxx> wrote:
>
> Hi,
>
> On 2020-03-19 10:23:48 -0500, Justin King wrote:
> > > From a single stats snapshot we can't actually understand the actual xid
> > > consumption - is it actually the xid usage that triggers the vacuums?
> >
> > We have looked at this and the xid consumption averages around 1250
> > xid/sec -- this is when we see the "aggressive" autovac kick off in
> > the logs.  What I don't understand is why these xid's are being
> > consumed at this rate on the databases with no activity (postgres,
> > template1).
>
> The xid counter is global across all databases.

Then what does the "age" value represent for each database in this
case?  Perhaps I'm misunderstanding what I'm looking at?

postgres=#  SELECT datname, age(datfrozenxid),
current_setting('autovacuum_freeze_max_age') FROM pg_database;
  datname  |    age    | current_setting
-----------+-----------+-----------------
 postgres  | 100937449 | 200000000
 template1 |  50244438 | 200000000
 template0 | 160207297 | 200000000
 feedi     | 150147602 | 200000000

>
> > > What makes you think it is a problem that you have all these vacuums? If
> > > you actually update that much, and you have indexes, you're going want a
> > > lot of vacuums?
>
> > I actually don't think there's a problem with the vacuums (I was
> > mostly pointing out that they are very regular and not problematic).
> > The main problem I am having is that something is causing the
> > autovacuums to completely stop and require manual intervention to
> > resume -- and it seems to be when the "postgres" or "template1"
> > database hits the autovacuum_freeze_max_age.
>
> Did you look at pg_stat_activity for those autovacuums to see whether
> they're blocked on something?

This is not something we've done yet but will next time it occurs.

> > > > What is interesting is that this happens with the 'postgres' and
> > > > 'template1' databases as well and there is absolutely no activity in
> > > > those databases.
> > >
> > > That's normal. They should be pretty darn quick in v12?
> >
> > Yes, a manual VACUUM FREEZE of either database takes less than 1
> > second -- which is why it's perplexing that the autovac starts and
> > never seems to complete and prevents other autovacs from running.
>
> One big difference between a manual VACUUM and autovacuum is that with
> the default settings VACUUM is not throttled, but autovacuum is.
>
> What are your vacuum_cost_delay, autovacuum_vacuum_cost_delay,
> vacuum_cost_limit, autovacuum_vacuum_cost_limit, vacuum_cost_page_hit,
> vacuum_cost_page_miss set to?

Here are all the vacuum related values for the server:

postgres=# select name,setting from pg_settings where name like '%vacuum%';
autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 2500
autovacuum_freeze_max_age = 200000000
autovacuum_max_workers = 8
autovacuum_multixact_freeze_max_age = 400000000
autovacuum_naptime = 15
autovacuum_vacuum_cost_delay = 20
autovacuum_vacuum_cost_limit = -1
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 500
autovacuum_work_mem = -1
log_autovacuum_min_duration = 0
vacuum_cleanup_index_scale_factor = 0.1
vacuum_cost_delay = 0
vacuum_cost_limit = 1000
vacuum_cost_page_dirty = 20
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_defer_cleanup_age = 0
vacuum_freeze_min_age = 50000000
vacuum_freeze_table_age = 150000000
vacuum_multixact_freeze_min_age = 5000000
vacuum_multixact_freeze_table_age = 150000000

I know the database is busy, so the throttling makes sense, but it
seems like it would complete eventually. We see blocked autovacs for
many hours.

> Greetings,
>
> Andres Freund