Re: Pg_rewind cannot load history wal

Michael Paquier <michael@xxxxxxxxxxx> · Sat, 4 Aug 2018 22:13:12 +0900



On Sat, Aug 04, 2018 at 04:59:45AM -0700, Andres Freund wrote:
> On 2018-08-04 10:54:22 +0100, Simon Riggs wrote:
>> pg_rewind doesn't work correctly. Documenting a workaround doesn't change that.
> 
> Especially because most people will only understand this after they've
> been hit, as test scenarios will often just be quick enough.

Well, since its creation we have the tool behave this way.  I am not
sure either that we can have pg_rewind create a checkpoint on the source
node each time a rewind is done, as it may not be necessary, and it
would enforce WAL segment recycling more than necessary, so if we were 
to back-patch something like that I am pretty much convinced that we
would get complains from people already using the tool, with existing
failover flows which are broken.  Making this stuff to not need a
checkpoint is actually possible.  When the source is offline, the
control file can be relied on as the shutdown checkpoint would update
the on-disk control file.  When the source is online, pg_rewind only
needs to know the new timeline number from the source, which we could
provide via a SQL function, but that would work only on HEAD (look at
ControlFile_source, you would see that only the new TLI matters, and
that getTimelineHistory does not really need to know the contents of the
control file).
--
Michael
Attachment:
signature.asc

Description: PGP signature