Search Postgresql Archives

Re: datfrozenxid not dropping after vacuum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021-Sep-01, Matthew Tice wrote:

[ problem table is pg_database ]

> My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> standby databases have been patched to 10.17.

Hmm, I think there was a bug in the early 10.x versions where advancing
the xid age of shared tables would not work correctly for some reason ...
Ah yes, this was fixed in 10.5, a mere three years ago:

Author: Andres Freund <andres@xxxxxxxxxxx>
Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700
Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700
Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700
Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700
Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700

    Fix bugs in vacuum of shared rels, by keeping their relcache entries current.
    
    When vacuum processes a relation it uses the corresponding relcache
    entry's relfrozenxid / relminmxid as a cutoff for when to remove
    tuples etc. Unfortunately for nailed relations (i.e. critical system
    catalogs) bugs could frequently lead to the corresponding relcache
    entry being stale.
    
    This set of bugs could cause actual data corruption as vacuum would
    potentially not remove the correct row versions, potentially reviving
    them at a later point.  After 699bf7d05c some corruptions in this vein
    were prevented, but the additional error checks could also trigger
    spuriously. Examples of such errors are:
      ERROR: found xmin ... from before relfrozenxid ...
    and
      ERROR: found multixact ... from before relminmxid ...
    To be caused by this bug the errors have to occur on system catalog
    tables.
    
    The two bugs are:
    
    1) Invalidations for nailed relations were ignored, based on the
       theory that the relcache entry for such tables doesn't
       change. Which is largely true, except for fields like relfrozenxid
       etc.  This means that changes to relations vacuumed in other
       sessions weren't picked up by already existing sessions.  Luckily
       autovacuum doesn't have particularly longrunning sessions.
    
    2) For shared *and* nailed relations, the shared relcache init file
       was never invalidated while running.  That means that for such
       tables (e.g. pg_authid, pg_database) it's not just already existing
       sessions that are affected, but even new connections are as well.
       That explains why the reports usually were about pg_authid et. al.
    
    To fix 1), revalidate the rd_rel portion of a relcache entry when
    invalid. This implies a bit of extra complexity to deal with
    bootstrapping, but it's not too bad.  The fix for 2) is simpler,
    simply always remove both the shared and local init files.
    
    Author: Andres Freund
    Reviewed-By: Alvaro Herrera
    Discussion:
        https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@xxxxxxxxxxxxxxxxx
        https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@xxxxxxxxxxxxxx
        https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@xxxxxxxxxxxxxx
        https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@xxxxxxxxxxxxxx
    Backpatch: 9.3-


-- 
Álvaro Herrera           39°49'30"S 73°17'W
"El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente" (UPM, 1972)





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux