Greetings, * Tiemen Ruiten (t.ruiten@xxxxxxxxxxx) wrote: > On Sun, Jun 16, 2019 at 7:30 PM Stephen Frost <sfrost@xxxxxxxxxxx> wrote: > > Ok, so you want fewer checkpoints because you expect to failover to a > > replica rather than recover the primary on a failure. If you're doing > > synchronous replication, then that certainly makes sense. If you > > aren't, then you're deciding that you're alright with losing some number > > of writes by failing over rather than recovering the primary, which can > > also be acceptable but it's certainly much more questionable. > > Yes, in our setup that's the case: a few lost transactions will have a > negligible impact to the business. There's a risk here that the impact could be much higher than just 'a few'. Hopefully you're monitoring it carefully and have some plan of what to do if the difference grows very quickly. > > Alternatively, having a way to more easily make > > the primary to accepting new writes, flush everything to the replicas, > > report that it's completed doing so, to allow you to promote a replica > > without losing anything, and *then* go through the process on the > > primary of doing a checkpoint, would be kind of nice. > > I suppose that would require being able to demote a master to a slave > during runtime. That's.. not the case. If you're really not worried about losing transactions, then an immediate shutdown as suggested by Peter G would let you fail over to one of your not-completely-current replicas.. Or you could just do that as soon as you start shutting down the primary. Of course, you'll end up in a situation where the replica will have fork'ed off of the timeline from a point prior to where the primary is at, meaning you'd have to use pg_rewind (or do a restore from an earlier backup) to re-build the former-primary as a replica, if that's the goal. If the idea is that you don't want to lose any transactions during this process, and you want to be able to cleanly bring the former primary back, then I don't think we've really got a great solution where you can make sure that all the WAL has been flushed out to the replicas and the primary just finishes its checkpoint (and thinking about that a bit more, I'm not sure we could actually do it anyway since we're going to want to write out that shutdown record, making it a catch-22, but maybe there's something clever that could be done there..). Thanks, Stephen
Attachment:
signature.asc
Description: PGP signature