Re: checkpoints taking much longer than expected

Stephen Frost <sfrost@xxxxxxxxxxx> · Sun, 16 Jun 2019 13:30:28 -0400

Greetings,

* Tiemen Ruiten (t.ruiten@xxxxxxxxxxx) wrote:
> On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost <sfrost@xxxxxxxxxxx> wrote:
> > * Tiemen Ruiten (t.ruiten@xxxxxxxxxxx) wrote:
> > > checkpoint_timeout = 60min
> >
> > That seems like a pretty long timeout.
> 
> My reasoning was that a longer recovery time to avoid writes would be
> acceptable because there are two more nodes in the cluster to fall back on
> in case of emergency.

Ok, so you want fewer checkpoints because you expect to failover to a
replica rather than recover the primary on a failure.  If you're doing
synchronous replication, then that certainly makes sense.  If you
aren't, then you're deciding that you're alright with losing some number
of writes by failing over rather than recovering the primary, which can
also be acceptable but it's certainly much more questionable.

> > > My problem is that checkpoints are taking a long time. Even when I run a
> > > few manual checkpoints one after the other, they keep taking very long,
> > up
> > > to 10 minutes:
> >
> > You haven't said *why* this is an issue...  Why are you concerned with
> > how long it takes to do a checkpoint?
> 
> During normal operation I don't mind that it takes a long time, but when
> performing maintenance I want to be able to gracefully bring down the
> master without long delays to promote one of the standby's.

I'm getting the feeling that your replicas are async, but it sounds like
you'd be better off with having at least one sync replica, so that you
can flip to it quickly.  Alternatively, having a way to more easily make
the primary to accepting new writes, flush everything to the replicas,
report that it's completed doing so, to allow you to promote a replica
without losing anything, and *then* go through the process on the
primary of doing a checkpoint, would be kind of nice.

Then again, you run into the issue that if your async replicas are very
far behind then you're still going to have a long period of time between
the "stop accepting new writes" and "finished flushing everything to the
replicas".

> > The time information is all there and it tells you what it's doing and
> > how much had to be done... If you're unhappy with how long it takes to
> > write out gigabytes of data and fsync hundreds of files, talk to your
> > storage people...
> 
> I am the storage people too :)

Great!  Make it go faster. :)

Thanks,

Stephen
Attachment:
signature.asc

Description: PGP signature