On Wed, Mar 25, 2020 at 07:59:56PM -0700, Andres Freund wrote: > FWIW, this kind of thing is why I think the added skipping logic is a > bad idea. Silently skipping things like this (same with the "bogus" > logic in datfrozenxid computation) is dangerous. I think we should > seriously consider backing this change out. That's actually what I would like to do at this stage as a first step. It looks pretty clear that it does not help. > And if not, then we should at least include enough detail in the message > to be able to debug this. Sure. In any attempt I have done until now I was easily able to skip some jobs, but it should get easier with a higher number of concurrent workers and a higher number of relations heavily updated. Thinking about it, only catalog jobs were getting skipped in my own runs... >> postgres=# SELECT datname, age(datfrozenxid), datfrozenxid FROM >> pg_database ORDER BY age(datfrozenxid) DESC LIMIT 1; >> datname | age | datfrozenxid >> ----------+-----------+-------------- >> postgres | 202773709 | 4284570172 > > And why should this lead to anti-wraparound vacuums not happening? This > is older than the the cutoff age? > > xid 4284570172 having the age of 202 million xids suggests that > ReadNewTransactionId() is approx 192376585. Which comports with the log > saying: oldest xmin: 189591147. Oops, sorry. My previous email was incorrect. It looked strange to not see datfrozenxid being refreshed. > Or are you saying that you conclude that the relcache entry is somehow > out of date? It sure is interesting that all of the tables that hit the > "skipping redundant vacuum" condition are shared tables. Yeah, that's actually what I was thinking yesterday. In heap_vacuum_rel(), xidFullScanLimit may be calculated right, but an incorrect value of rd_rel->relminmxid or rd_rel->relfrozenxid could lead to a job to become not aggressive. It should be actually easy enough to check that. -- Michael
Attachment:
signature.asc
Description: PGP signature