We upgraded to 9.4.5 on 19 October, and there was a successful automatic vacuum over pg_toast_376621 just 3 days later - on 22 October: Oct 22 08:16:49 db-master postgres[10589]: [3-1] []: LOG: automatic vacuum of table “db.pg_toast.pg_toast_376621": index scans: 1 Oct 22 08:16:49 db-master postgres[10589]: [3-2] pages: 0 removed, 784361 remain Oct 22 08:16:49 db-master postgres[10589]: [3-3] tuples: 110 removed, 3768496 remain, 0 are dead but not yet removable Oct 22 08:16:49 db-master postgres[10589]: [3-4] buffer usage: 37193 hits, 44891 misses, 32311 dirtied Oct 22 08:16:49 db-master postgres[10589]: [3-5] avg read rate: 0.954 MB/s, avg write rate: 0.686 MB/s Oct 22 08:16:49 db-master postgres[10589]: [3-6] system usage: CPU 1.10s/1.67u sec elapsed 367.73 sec The next automatic vacuum came 8 days later - on 30 October and failed and it is failing ever since: Oct 30 14:22:01 db-master postgres[16160]: [3-1] []: ERROR: MultiXactId 2915905228 does no longer exist -- apparent wraparound Oct 30 14:22:01 db-master postgres[16160]: [3-2] []: CONTEXT: automatic vacuum of table “db.pg_toast.pg_toast_376621” So I guess something happened between 22 and 30 October and there is no relation to the pg_upgrade we did on 19 October. It would be useful to debug this that you attached gdb to a backend, set I will try to obtain the page number, and will then send you the results, thank you. Can we somehow do it on one of our replicas (after detaching it), i.e. is the corrupted record propagated through the replication channel, and in the meantime fix the table on the master? Thanks! — Kouber Saparev |