On Sat, Aug 04, 2018 at 04:59:45AM -0700, Andres Freund wrote: > On 2018-08-04 10:54:22 +0100, Simon Riggs wrote: >> pg_rewind doesn't work correctly. Documenting a workaround doesn't change that. > > Especially because most people will only understand this after they've > been hit, as test scenarios will often just be quick enough. Well, since its creation we have the tool behave this way. I am not sure either that we can have pg_rewind create a checkpoint on the source node each time a rewind is done, as it may not be necessary, and it would enforce WAL segment recycling more than necessary, so if we were to back-patch something like that I am pretty much convinced that we would get complains from people already using the tool, with existing failover flows which are broken. Making this stuff to not need a checkpoint is actually possible. When the source is offline, the control file can be relied on as the shutdown checkpoint would update the on-disk control file. When the source is online, pg_rewind only needs to know the new timeline number from the source, which we could provide via a SQL function, but that would work only on HEAD (look at ControlFile_source, you would see that only the new TLI matters, and that getTimelineHistory does not really need to know the contents of the control file). -- Michael
Attachment:
signature.asc
Description: PGP signature