Hi Andres- Thanks for the reply, answers below. On Tue, Mar 17, 2020 at 8:19 PM Andres Freund <andres@xxxxxxxxxxx> wrote: > > Hi, > > On 2020-03-17 17:18:57 -0500, Justin King wrote: > > As you can see in this table, there are only ~80K rows, but billions > > of updates. What we have observed is that the frozenxid reaches the > > 200M mark fairly quickly because of the amount of activity. > > And each of those updates is in a separate transaction? Is that > required? I.e. any chance to perform multiple of those updates in one > transaction? > > Have you considered just increasing the vacuum limit? It's mostly there > because it can increase space usage a bit, but given today's systems its > not a usually a problem unless you have hundreds of postgres instances > on a single system. It's pretty common to just increase that to 1.8 > billion or so. We have considered increasing the limit, but as I mentioned, the problem that we're seeing is that (effectively) a autovac starts on a system database (postgres, template1) and never completes, or deadlocks, or something. This completely stops autovacs from running until we manually intervene and run a VACUUM FREEZE -- at which point, the autovacs resume. If we increase the vacuum limit and this situation happens, we're going to be in real trouble. > From a single stats snapshot we can't actually understand the actual xid > consumption - is it actually the xid usage that triggers the vacuums? We have looked at this and the xid consumption averages around 1250 xid/sec -- this is when we see the "aggressive" autovac kick off in the logs. What I don't understand is why these xid's are being consumed at this rate on the databases with no activity (postgres, template1). > > > What makes you think it is a problem that you have all these vacuums? If > you actually update that much, and you have indexes, you're going want a > lot of vacuums? > I actually don't think there's a problem with the vacuums (I was mostly pointing out that they are very regular and not problematic). The main problem I am having is that something is causing the autovacuums to completely stop and require manual intervention to resume -- and it seems to be when the "postgres" or "template1" database hits the autovacuum_freeze_max_age. > > > What is interesting is that this happens with the 'postgres' and > > 'template1' databases as well and there is absolutely no activity in > > those databases. > > That's normal. They should be pretty darn quick in v12? Yes, a manual VACUUM FREEZE of either database takes less than 1 second -- which is why it's perplexing that the autovac starts and never seems to complete and prevents other autovacs from running. > > Greetings, > > Andres Freund