Re: postgresql 9.6 - cannot freeze committed xmax

Andres Freund <andres@xxxxxxxxxxx> · Mon, 26 Feb 2018 15:21:13 -0800

On 2018-02-26 20:09:31 -0300, Alvaro Herrera wrote:
> Alexandre Garcia wrote:
> 
> > Recently our team performed an upgrade on one of our old postgres 9.2
> > databases. "for science" we tried a direct upgrade from 9.2 to 9.6  on our
> > staging environment . The initial plan was to do 9.2 -> 9.4 and 9.4 -> 9.6.
> > 
> > The upgrade turned out successful on staging and we decided to go with it
> > on prod as well.
> > Prod starting throwing the following errors during autovacuum -> 'cannot
> > freeze committed xmax <xid>' on 2 different tables. Running vacuum manually
> > revealed more tables affected by the same error.
> > 
> > The staging database did not present any error but the process to sync prod
> > to stage includes a sanitize script that removes sensitive information and
> > it somehow seems to fix the issue on stage (we've done a sync from prod to
> > stage after the upgrade)
> >
> > I've been doing a lot of search about this and even tried to go through the
> > code that throws that specific error.

Could you show pg_controldata from before/after pg_upgrade, and the
output of
SELECT oid::regclass, relfrozenxid, age(relfrozenxid), txid_current()
FROM pg_class
WHERE oid = 'problematic_table'::regclass;
from both before/after?

> This particular error condition is a sanity check that was only
> introduced in 9.6.7, so you would not find too many reports of that
> (this exact error message wording doesn't exist prior to that).  It is
> possible that we missed some corner case when writing that check.
> Upgrades from 9.2 are particularly unusual since the xmax header was
> reused in the 9.3 era to mean something completely different under some
> circumstances.  I'm not in a position to do deeper debugging for you at
> this time, though.
> 
> The commit in question is
> https://git.postgresql.org/pg/commitdiff/986a9153b9708071adf6ce2c9131266f3431f4ec
> 
> Wild guess: maybe we should be checking HEAP_LOCKED_UPGRADED before
> bailing out.

Doesn't !(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) already guard against
that?

Greetings,

Andres Freund