On 2020-Aug-01, Ben Chobot wrote: > We have a few hundred postgres servers in AWS EC2, all of which do streaming > replication to at least two replicas. As we've transitioned our fleet to > from 9.5 to 12.3, we've noticed an alarming increase in the frequency of a > streaming replica dying during replay. Postgres will log something like: > > |2020-07-31T16:55:22.602488+00:00 hostA postgres[31875]: [19137-1] db=,user= > LOG: restartpoint starting: time 2020-07-31T16:55:24.637150+00:00 hostA > postgres[24076]: [15754-1] db=,user= FATAL: incorrect index offsets supplied > 2020-07-31T16:55:24.637261+00:00 hostA postgres[24076]: [15754-2] db=,user= > CONTEXT: WAL redo at BCC/CB7AF8B0 for Btree/VACUUM: lastBlockVacuumed 1720 > 2020-07-31T16:55:24.642877+00:00 hostA postgres[24074]: [8-1] db=,user= LOG: > startup process (PID 24076) exited with exit code 1| I've never seen this one. Can you find out what the index is being modified by those LSNs -- is it always the same index? Can you have a look at nearby WAL records that touch the same page of the same index in each case? One possibility is that the storage forgot a previous write. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services