Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

Manuel Lausch <manuel.lausch@xxxxxxxx> · Mon, 8 Nov 2021 13:03:24 +0100

Okay.
The default vaule for paxos_propose_interval seems to be "1.0" not
"2.0". But anyway, reducing to 0.25 seems to fix this issue on our
testing cluster.

I wanted to test some failure scenarios with this value and had a look
to the osdmap epoch to check how many new maps will be created.
On the corresponding graph I did see, that since the update to octopus
(and in nautilus too) the epoch is continuously increasing (see my
other mail). The diff between two maps is empty, expect of the epoch
and creation date.

Manuel

On Fri, 5 Nov 2021 18:33:58 -0500
Sage Weil <sage@xxxxxxxxxxxx> wrote:

> Yeah, I think two different things are going on here.
> 
> The read leases were new, and I think the way that OSDs are marked
> down is the key things that affects that behavior. I'm a bit
> surprised that the _notify_mon option helps there, and will take a
> closer look at that Monday to make sure it's doing what it's supposed
> to be doing.
> 
> The paxos_propose_interval is an upper bound on how long the monitor
> is allowed to batch updates before committing them.  Many/most
> changes are committed immediately, but the osdmap management tries to
> batch things up so that a single osdmap epoch combines lots of
> changes when they are happening quickly (there tends to be mini
> storms up dates when cluster changes happen).  The default of 2s
> might be too much for many environments, though... and we might
> consider changing the default to something smaller (maybe more like
> 250ms).
> 
> sage
> 

> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx