pg_rewind: invalid record length

Ben Wheatley <benwheatley@xxxxxxxxxxxxxx> · Tue, 17 Sep 2019 18:13:57 +0100

Hi,

We're encountering an issue when running pg_rewind, and are looking for advice
on how to proceed.

We have a set of 3 Postgres instances which are being restored from the same
physical disk snapshot (which was taken from a standby on a production system),
in order to test a disaster recovery setup.

The cluster management software that we're using picks one of these 3 instances
as a primary, promotes it, and then 'resyncs' the other two instances in order
to boot them as standbys. A resync uses pg_rewind, because it does not know the
provenance of the existing data on disk, and attempts to bring the two
instances into line in case the timelines have diverged.

In this configuration our target timeline is a direct ancestor of the source
timeline, rather than a fork.

When pg_rewind is run under this configuration, we encounter the following
error (debug output included):

---
connected to server
fetched file "global/pg_control", length 8192
fetched file "pg_wal/0000000E.history", length 561
Source timeline history:
Target timeline history:
1: 0/0 - 0/7144130
2: 0/7144130 - 0/10806F98
3: 0/10806F98 - 0/11000098
4: 0/11000098 - 0/12000098
5: 0/12000098 - 0/13000098
6: 0/13000098 - 0/14000098
7: 0/14000098 - 0/14000288
8: 0/14000288 - 0/15000098
9: 0/15000098 - 0/16000C88
10: 0/16000C88 - 0/18000098
11: 0/18000098 - 0/180001A8
12: 0/180001A8 - 0/190736B0
13: 0/190736B0 - 0/0
servers diverged at WAL location 0/2223F588 on timeline 13

could not find previous WAL record at 0/2223F588: invalid record
length at 0/2223F588: wanted 24, got 0
Failure, exiting
---

We believe this to mean that pg_rewind, when attempting to determine the last
checkpoint, has started the search at the end of the WAL that exists on disk,
in the position where the next expected WAL record was going to be written, and
therefore returns an error because it can't examine the xl_prev field of the
XLogRecord, in order to continue the search backwards for the checkpoint.

The questions that we're faced with are:
1. Is this a valid use of pg_rewind? (i.e. rewinding when there has been no
fork)
2. What is causing pg_rewind to begin its search at the end of the WAL? (is
this because we haven't actually written any data in this cluster since the
promotion)
3. Would it be valid and/or correct to make pg_rewind skip backwards if it
finds a zero-length record? (this does seem to mitigate the error - see
attached patch)

Thanks,
Ben
Attachment:
0001-Fix-pg_rewind-when-divergence-is-at-end-of-WAL.patch

Description: Binary data