On 2021-Sep-01, Matthew Tice wrote: [ problem table is pg_database ] > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my > standby databases have been patched to 10.17. Hmm, I think there was a bug in the early 10.x versions where advancing the xid age of shared tables would not work correctly for some reason ... Ah yes, this was fixed in 10.5, a mere three years ago: Author: Andres Freund <andres@xxxxxxxxxxx> Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700 Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700 Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700 Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700 Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700 Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700 Fix bugs in vacuum of shared rels, by keeping their relcache entries current. When vacuum processes a relation it uses the corresponding relcache entry's relfrozenxid / relminmxid as a cutoff for when to remove tuples etc. Unfortunately for nailed relations (i.e. critical system catalogs) bugs could frequently lead to the corresponding relcache entry being stale. This set of bugs could cause actual data corruption as vacuum would potentially not remove the correct row versions, potentially reviving them at a later point. After 699bf7d05c some corruptions in this vein were prevented, but the additional error checks could also trigger spuriously. Examples of such errors are: ERROR: found xmin ... from before relfrozenxid ... and ERROR: found multixact ... from before relminmxid ... To be caused by this bug the errors have to occur on system catalog tables. The two bugs are: 1) Invalidations for nailed relations were ignored, based on the theory that the relcache entry for such tables doesn't change. Which is largely true, except for fields like relfrozenxid etc. This means that changes to relations vacuumed in other sessions weren't picked up by already existing sessions. Luckily autovacuum doesn't have particularly longrunning sessions. 2) For shared *and* nailed relations, the shared relcache init file was never invalidated while running. That means that for such tables (e.g. pg_authid, pg_database) it's not just already existing sessions that are affected, but even new connections are as well. That explains why the reports usually were about pg_authid et. al. To fix 1), revalidate the rd_rel portion of a relcache entry when invalid. This implies a bit of extra complexity to deal with bootstrapping, but it's not too bad. The fix for 2) is simpler, simply always remove both the shared and local init files. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@xxxxxxxxxxxxxxxxx https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@xxxxxxxxxxxxxx https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@xxxxxxxxxxxxxx https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@xxxxxxxxxxxxxx Backpatch: 9.3- -- Álvaro Herrera 39°49'30"S 73°17'W "El número de instalaciones de UNIX se ha elevado a 10, y se espera que este número aumente" (UPM, 1972)