Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

Manuel Lausch <manuel.lausch@xxxxxxxx> · Tue, 2 Nov 2021 14:29:46 +0100

Hi Sage,

The "osd_fast_shutdown" is set to "false"
As we upgraded to luminous I also had blocked IO issuses with this
enabled.

Some weeks ago I tried out the options "osd_fast_shutdown" and
"osd_fast_shutdown_notify_mon" and also got slow ops while
stopping/starting OSDs. But I didn't ceck if this triggert the
problem with the read_leases or if this triggert my old issue
with the fast shutodnw.

You write if the osd rejects messenger connections, because it is
stopped, the peering process will skip the read_lease timeout. If the
OSD annouces its shutdown, can we not skip this read_lease timeout as
well?

These days I will test the fast_shutdown switch again and will share the
corresponding logs with you.

Viele Grüße aus Karlsruhe
Manuel

On Mon, 1 Nov 2021 15:55:35 -0500
Sage Weil <sage@xxxxxxxxxxxx> wrote:

> Hi Manuel,
> 
> I'm looking at the ticket for this issue (
> https://tracker.ceph.com/issues/51463) and tried to reproduce.  This
> was initially trivial to do with vstart (rados bench paused for many
> seconds afters stopping an osd) but it turns out that was because the
> vstart ceph.conf includes `osd_fast_shutdown = false`.  Once I
> enabled that again (as it is by default on a normal cluster) I did
> not see any noticeable interruption when an OSD was stopped.
> 
> Can you confirm what osd_fast_shutdown and
> osd_fast_shutdown_notify_mon are set to on your cluster?
> 
> The intent is that when an OSD goes down, it will no longer accept
> messenger connection attempts, and peer OSDs will inform the monitor
> with a flag indicating the OSD is definitely dead (vs slow or
> unresponsive).  This will allow the peering process to skip waiting
> for the read lease to time out.  If you're seeing the laggy or
> 'waiting for readable' state, then that isn't happening.. probably
> because the OSD shutdown isn't working as originally intended.
> 
> If it's not one of those two options, make you can include a 'ceph
> config dump' (or jsut a list of the changed values at least) so we
> can see what else might be affecting OSD shutdown...
> 
> Thanks!
> sage
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx