Re: ERROR: found multixact from before relminmxid

Alexandre Arruda <adaldeia@xxxxxxxxx> · Mon, 9 Apr 2018 08:49:48 -0300

2018-04-06 13:11 GMT-03:00 Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx>:

On 04/06/2018 04:29 PM, Alexandre Arruda wrote:

> 2018-04-06 9:39 GMT-03:00 Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx

> <mailto:tomas.vondra@2ndquadrant.com>>:

>

>

>

>     On 04/06/2018 02:09 AM, Alexandre Arruda wrote:

>     > Hi,

>     >

>     > Some time ago, I had this errors frequently showed in logs after some

>     > autovacuum in some tables(pg 9.6). VACUUM FULL or CLUSTER in this tables

>     > show the same and not complete the tasks (showed by some table bloat

>     > select).

>     > Then, I did a full dump/restore into a new version (10.2) and everything

>     > is ok for a couple of months. Now, I have errors like this again:

>     >

>     > db1=# cluster pc44t;

>     >

>     > ERROR:  found multixact 134100944 from before relminmxid 192042633

>     >

>     >  

>     >

>     > Like before, the only way to make the errors to desapear is by

>     > dump/reload the whole table.

>     >

>     >

>     > Thanks for any help.

>     >

>

>     That's going to be hard, unless you still have the cluster around.

>

>     This surely seems like some sort of data corruption issue, but without

>     being able to inspect the data pages it's nearly impossible to determine

>     what went wrong.

>

>     We'd also need more information about what happened to the hardware and

>     cluster before the issues started to appear - crashes, hardware issues.

>     And so on.

>

>     regards

>

>     --

>     Tomas Vondra                  http://www.2ndQuadrant.com

>     PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

>

>

> Hi Tomas,

> The old cluster are gone, unfortunatly.

>

> This server is a 128GB ECC RAM with a dual redundant hba fiber channel

> connect to a sotorage with Raid 6 and I don't have (apparently) any

> errors reported.

> Yesterday I did a test with one table:  some sum aggragates, count(*),

> etc, then dump/reload and repeat the tests the results (of querys) are

> the same, regarding the vacuum problem

> thats disapeared.

>

I'm not sure I understand correctly. So you can reproduce the issue? If

yes, how can you share the scripts/data you use (and how large is it)?

If we could reproduce it locally, it would make the investigation much

easier.

BTW you mentioned you're using PostgreSQL 9.6 - which minor version,

specifically?

regards

Hi Tomas,

No, I can't reproduce. What I did is a simple way to "validate" the current table data to see if a dump/reload
preserve them. Old postgresql was 9.6.5. The problem returns now in new 10.3 installation.

There is a way to correct this tables without a dump/reload ?

I'm thinking to reinstall cluster doing a initdb --data-checksums, but I'm affraid about a severe performance impact.

Best regards,

Alexandre