RE: PG12 autovac issues

Justin King <kingpin867@xxxxxxxxx> · Thu, 19 Mar 2020 10:23:48 -0500

Hi Andres-

Thanks for the reply, answers below.

On Tue, Mar 17, 2020 at 8:19 PM Andres Freund <andres@xxxxxxxxxxx> wrote:
>
> Hi,
>
> On 2020-03-17 17:18:57 -0500, Justin King wrote:
> > As you can see in this table, there are only ~80K rows, but billions
> > of updates.  What we have observed is that the frozenxid reaches the
> > 200M mark fairly quickly because of the amount of activity.
>
> And each of those updates is in a separate transaction? Is that
> required? I.e. any chance to perform multiple of those updates in one
> transaction?
>
> Have you considered just increasing the vacuum limit? It's mostly there
> because it can increase space usage a bit, but given today's systems its
> not a usually a problem unless you have hundreds of postgres instances
> on a single system.  It's pretty common to just increase that to 1.8
> billion or so.

We have considered increasing the limit, but as I mentioned, the
problem that we're seeing is that (effectively) a autovac starts on a
system database (postgres, template1) and never completes, or
deadlocks, or something.  This completely stops autovacs from running
until we manually intervene and run a VACUUM FREEZE -- at which point,
the autovacs resume.  If we increase the vacuum limit and this
situation happens, we're going to be in real trouble.

> From a single stats snapshot we can't actually understand the actual xid
> consumption - is it actually the xid usage that triggers the vacuums?

We have looked at this and the xid consumption averages around 1250
xid/sec -- this is when we see the "aggressive" autovac kick off in
the logs.  What I don't understand is why these xid's are being
consumed at this rate on the databases with no activity (postgres,
template1).

>
>
> What makes you think it is a problem that you have all these vacuums? If
> you actually update that much, and you have indexes, you're going want a
> lot of vacuums?
>

I actually don't think there's a problem with the vacuums (I was
mostly pointing out that they are very regular and not problematic).
The main problem I am having is that something is causing the
autovacuums to completely stop and require manual intervention to
resume -- and it seems to be when the "postgres" or "template1"
database hits the autovacuum_freeze_max_age.

>
> > What is interesting is that this happens with the 'postgres' and
> > 'template1' databases as well and there is absolutely no activity in
> > those databases.
>
> That's normal. They should be pretty darn quick in v12?

Yes, a manual VACUUM FREEZE of either database takes less than 1
second -- which is why it's perplexing that the autovac starts and
never seems to complete and prevents other autovacs from running.

>
> Greetings,
>
> Andres Freund