On Sun, Jun 16, 2019 at 7:30 PM Stephen Frost <sfrost@xxxxxxxxxxx> wrote:
Ok, so you want fewer checkpoints because you expect to failover to a
replica rather than recover the primary on a failure. If you're doing
synchronous replication, then that certainly makes sense. If you
aren't, then you're deciding that you're alright with losing some number
of writes by failing over rather than recovering the primary, which can
also be acceptable but it's certainly much more questionable.
Yes, in our setup that's the case: a few lost transactions will have a negligible impact to the business.
I'm getting the feeling that your replicas are async, but it sounds like
you'd be better off with having at least one sync replica, so that you
can flip to it quickly.
They are indeed async, we traded durability for performance here, because we can accept some lost transactions.
Alternatively, having a way to more easily make
the primary to accepting new writes, flush everything to the replicas,
report that it's completed doing so, to allow you to promote a replica
without losing anything, and *then* go through the process on the
primary of doing a checkpoint, would be kind of nice.
I suppose that would require being able to demote a master to a slave during runtime.
That would definitely be nice-to-have.
That would definitely be nice-to-have.
Thanks,
Stephen