Thanks for your answers, took me a little while to restore a backup from before the upgrade on an empty 9.2 server but here we are => https://gist.github.com/kuuji/cc73c3c193a0d58aa0e651fbd7bfec58
On Mon, Feb 26, 2018 at 6:21 PM Andres Freund <andres@xxxxxxxxxxx> wrote:
On 2018-02-26 20:09:31 -0300, Alvaro Herrera wrote:
> Alexandre Garcia wrote:
>
> > Recently our team performed an upgrade on one of our old postgres 9.2
> > databases. "for science" we tried a direct upgrade from 9.2 to 9.6 on our
> > staging environment . The initial plan was to do 9.2 -> 9.4 and 9.4 -> 9.6.
> >
> > The upgrade turned out successful on staging and we decided to go with it
> > on prod as well.
> > Prod starting throwing the following errors during autovacuum -> 'cannot
> > freeze committed xmax <xid>' on 2 different tables. Running vacuum manually
> > revealed more tables affected by the same error.
> >
> > The staging database did not present any error but the process to sync prod
> > to stage includes a sanitize script that removes sensitive information and
> > it somehow seems to fix the issue on stage (we've done a sync from prod to
> > stage after the upgrade)
> >
> > I've been doing a lot of search about this and even tried to go through the
> > code that throws that specific error.
Could you show pg_controldata from before/after pg_upgrade, and the
output of
SELECT oid::regclass, relfrozenxid, age(relfrozenxid), txid_current()
FROM pg_class
WHERE oid = 'problematic_table'::regclass;
from both before/after?
> This particular error condition is a sanity check that was only
> introduced in 9.6.7, so you would not find too many reports of that
> (this exact error message wording doesn't exist prior to that). It is
> possible that we missed some corner case when writing that check.
> Upgrades from 9.2 are particularly unusual since the xmax header was
> reused in the 9.3 era to mean something completely different under some
> circumstances. I'm not in a position to do deeper debugging for you at
> this time, though.
>
> The commit in question is
> https://git.postgresql.org/pg/commitdiff/986a9153b9708071adf6ce2c9131266f3431f4ec
>
> Wild guess: maybe we should be checking HEAP_LOCKED_UPGRADED before
> bailing out.
Doesn't !(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) already guard against
that?
Greetings,
Andres Freund