On 4 August 2018 at 07:56, Michael Paquier <michael@xxxxxxxxxxx> wrote: > On Sat, Aug 04, 2018 at 07:44:59AM +0100, Simon Riggs wrote: >> I think the problem is that writing the online checkpoint is deferred >> after promotion, so this is a timing issue that probably doesn't show >> in our regression tests. > > Somewhat. It is a performance improvement of 9.3 to let the startup > request a checkpoint to the checkpointer process instead of doing it > itself. Yes, and so issuing a manual CHECKPOINT would remove that benefit. >> Sounds like we should write a pending timeline change to the control >> file and have pg_rewind check that instead. >> >> I'd call this a timing bug, not a doc issue. > > Well, having pg_rewind enforce a checkpoint on the promoted standby > could cause a performance hit as well if we do it mandatorily as if > there is delay between the promotion and the rewind triggerring a > checkpoint could have already happen. So it is for me a documentation > bug first regarding the failover workflow, and potentially a patch for a > new feature which makes pg_rewind trigger directly a checkpoint. pg_rewind doesn't work correctly. Documenting a workaround doesn't change that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services