Re: ERROR: found multixact from before relminmxid

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Mon, 9 Apr 2018 16:01:44 +0200

On 04/09/2018 01:49 PM, Alexandre Arruda wrote:
> 
> 
> 2018-04-06 13:11 GMT-03:00 Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx
> <mailto:tomas.vondra@xxxxxxxxxxxxxxx>>:
> 
> 
> 
>     On 04/06/2018 04:29 PM, Alexandre Arruda wrote:
>     > 2018-04-06 9:39 GMT-03:00 Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx <mailto:tomas.vondra@xxxxxxxxxxxxxxx>
>     > <mailto:tomas.vondra@xxxxxxxxxxxxxxx
>     <mailto:tomas.vondra@xxxxxxxxxxxxxxx>>>:
>     >
>     >
>     >
>     >     On 04/06/2018 02:09 AM, Alexandre Arruda wrote:
>     >     > Hi,
>     >     >
>     >     > Some time ago, I had this errors frequently showed in logs
>     after some
>     >     > autovacuum in some tables(pg 9.6). VACUUM FULL or CLUSTER in
>     this tables
>     >     > show the same and not complete the tasks (showed by some
>     table bloat
>     >     > select).
>     >     > Then, I did a full dump/restore into a new version (10.2)
>     and everything
>     >     > is ok for a couple of months. Now, I have errors like this
>     again:
>     >     >
>     >     > db1=# cluster pc44t;
>     >     >
>     >     > ERROR:  found multixact 134100944 from before relminmxid
>     192042633
>     >     >
>     >     >  
>     >     >
>     >     > Like before, the only way to make the errors to desapear is by
>     >     > dump/reload the whole table.
>     >     >
>     >     >
>     >     > Thanks for any help.
>     >     >
>     >
>     >     That's going to be hard, unless you still have the cluster around.
>     >
>     >     This surely seems like some sort of data corruption issue, but
>     without
>     >     being able to inspect the data pages it's nearly impossible to
>     determine
>     >     what went wrong.
>     >
>     >     We'd also need more information about what happened to the
>     hardware and
>     >     cluster before the issues started to appear - crashes,
>     hardware issues.
>     >     And so on.
>     >
>     >     regards
>     >
>     >     --
>     >     Tomas Vondra                  http://www.2ndQuadrant.com
>     >     PostgreSQL Development, 24x7 Support, Remote DBA, Training &
>     Services
>     >
>     >
>     > Hi Tomas,
>     > The old cluster are gone, unfortunatly.
>     >
>     > This server is a 128GB ECC RAM with a dual redundant hba fiber channel
>     > connect to a sotorage with Raid 6 and I don't have (apparently) any
>     > errors reported.
>     > Yesterday I did a test with one table:  some sum aggragates, count(*),
>     > etc, then dump/reload and repeat the tests the results (of querys) are
>     > the same, regarding the vacuum problem
>     > thats disapeared.
>     >
> 
>     I'm not sure I understand correctly. So you can reproduce the issue? If
>     yes, how can you share the scripts/data you use (and how large is it)?
>     If we could reproduce it locally, it would make the investigation much
>     easier.
> 
>     BTW you mentioned you're using PostgreSQL 9.6 - which minor version,
>     specifically?
> 
> 
>     regards
> 
> 
> Hi Tomas,
> 
> No, I can't reproduce. What I did is a simple way to "validate" the
> current table data to see if a dump/reload
> preserve them. Old postgresql was 9.6.5. The problem returns now in new
> 10.3 installation.
> 

I'm confused. First you say you can't reproduce the issue, then you say
it got back on a new installation. So which is it?

> There is a way to correct this tables without a dump/reload ?
> 
> I'm thinking to reinstall cluster doing a initdb --data-checksums,
> but I'm affraid about a severe performance impact.
>  

The performance impact should be non-measurable for most workloads. It
might be a couple of percent on certain workloads/hardware, but that's
probably not your case.

The bigger question is whether this can actually detect the issue. If
it's due to an storage issue, then perhaps yes. But if you only see
multixact issues consistently and nothing else, it might easily be a
PostgreSQL bug (in which case the checksum will be correct).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services