Search Postgresql Archives

pg_rewind: invalid record length

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We're encountering an issue when running pg_rewind, and are looking for advice
on how to proceed.

We have a set of 3 Postgres instances which are being restored from the same
physical disk snapshot (which was taken from a standby on a production system),
in order to test a disaster recovery setup.

The cluster management software that we're using picks one of these 3 instances
as a primary, promotes it, and then 'resyncs' the other two instances in order
to boot them as standbys. A resync uses pg_rewind, because it does not know the
provenance of the existing data on disk, and attempts to bring the two
instances into line in case the timelines have diverged.

In this configuration our target timeline is a direct ancestor of the source
timeline, rather than a fork.


When pg_rewind is run under this configuration, we encounter the following
error (debug output included):

---
connected to server
fetched file "global/pg_control", length 8192
fetched file "pg_wal/0000000E.history", length 561
Source timeline history:
Target timeline history:
1: 0/0 - 0/7144130
2: 0/7144130 - 0/10806F98
3: 0/10806F98 - 0/11000098
4: 0/11000098 - 0/12000098
5: 0/12000098 - 0/13000098
6: 0/13000098 - 0/14000098
7: 0/14000098 - 0/14000288
8: 0/14000288 - 0/15000098
9: 0/15000098 - 0/16000C88
10: 0/16000C88 - 0/18000098
11: 0/18000098 - 0/180001A8
12: 0/180001A8 - 0/190736B0
13: 0/190736B0 - 0/0
servers diverged at WAL location 0/2223F588 on timeline 13

could not find previous WAL record at 0/2223F588: invalid record
length at 0/2223F588: wanted 24, got 0
Failure, exiting
---

We believe this to mean that pg_rewind, when attempting to determine the last
checkpoint, has started the search at the end of the WAL that exists on disk,
in the position where the next expected WAL record was going to be written, and
therefore returns an error because it can't examine the xl_prev field of the
XLogRecord, in order to continue the search backwards for the checkpoint.

The questions that we're faced with are:
1. Is this a valid use of pg_rewind? (i.e. rewinding when there has been no
fork)
2. What is causing pg_rewind to begin its search at the end of the WAL? (is
this because we haven't actually written any data in this cluster since the
promotion)
3. Would it be valid and/or correct to make pg_rewind skip backwards if it
finds a zero-length record? (this does seem to mitigate the error - see
attached patch)

Thanks,
Ben

Attachment: 0001-Fix-pg_rewind-when-divergence-is-at-end-of-WAL.patch
Description: Binary data


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux