Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 8 Nov 2021 09:08:50 -0600

On Mon, Nov 8, 2021 at 6:03 AM Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:

> Okay.
> The default vaule for paxos_propose_interval seems to be "1.0" not
> "2.0". But anyway, reducing to 0.25 seems to fix this issue on our
> testing cluster.
>
> I wanted to test some failure scenarios with this value and had a look
> to the osdmap epoch to check how many new maps will be created.
> On the corresponding graph I did see, that since the update to octopus
> (and in nautilus too) the epoch is continuously increasing (see my
> other mail). The diff between two maps is empty, expect of the epoch
> and creation date.
>

That is concerning.  Can you set debug_mon = 20 and capture a minute or so
of logs?  (Enough to include a few osdmap epochs.)  You can use
ceph-post-file to send it to us.

Thanks!
sage

>
>
> Manuel
>
>
> On Fri, 5 Nov 2021 18:33:58 -0500
> Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> > Yeah, I think two different things are going on here.
> >
> > The read leases were new, and I think the way that OSDs are marked
> > down is the key things that affects that behavior. I'm a bit
> > surprised that the _notify_mon option helps there, and will take a
> > closer look at that Monday to make sure it's doing what it's supposed
> > to be doing.
> >
> > The paxos_propose_interval is an upper bound on how long the monitor
> > is allowed to batch updates before committing them.  Many/most
> > changes are committed immediately, but the osdmap management tries to
> > batch things up so that a single osdmap epoch combines lots of
> > changes when they are happening quickly (there tends to be mini
> > storms up dates when cluster changes happen).  The default of 2s
> > might be too much for many environments, though... and we might
> > consider changing the default to something smaller (maybe more like
> > 250ms).
> >
> > sage
> >
>
> >
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx