Search Postgresql Archives

Re: 12.3 replicas falling over during WAL redo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Alvaro Herrera wrote on 8/1/20 9:35 AM:
On 2020-Aug-01, Ben Chobot wrote:

We have a few hundred postgres servers in AWS EC2, all of which do streaming
replication to at least two replicas. As we've transitioned our fleet to
from 9.5 to 12.3, we've noticed an alarming increase in the frequency of a
streaming replica dying during replay. Postgres will log something like:

|2020-07-31T16:55:22.602488+00:00 hostA postgres[31875]: [19137-1] db=,user=
LOG: restartpoint starting: time 2020-07-31T16:55:24.637150+00:00 hostA
postgres[24076]: [15754-1] db=,user= FATAL: incorrect index offsets supplied
2020-07-31T16:55:24.637261+00:00 hostA postgres[24076]: [15754-2] db=,user=
CONTEXT: WAL redo at BCC/CB7AF8B0 for Btree/VACUUM: lastBlockVacuumed 1720
2020-07-31T16:55:24.642877+00:00 hostA postgres[24074]: [8-1] db=,user= LOG:
startup process (PID 24076) exited with exit code 1|
I've never seen this one.

Can you find out what the index is being modified by those LSNs -- is it
always the same index?  Can you have a look at nearby WAL records that
touch the same page of the same index in each case?

One possibility is that the storage forgot a previous write.

I'd be happy to, if you tell me how. :)

We're using xfs for our postgres filesystem, on ubuntu bionic. Of course it's always possible there's something wrong in the filesystem or the EBS layer, but that is one thing we have not changed in the migration from 9.5 to 12.3.





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux