Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 2 Nov 2021 09:02:31 -0500

On Tue, Nov 2, 2021 at 8:29 AM Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:

> Hi Sage,
>
> The "osd_fast_shutdown" is set to "false"
> As we upgraded to luminous I also had blocked IO issuses with this
> enabled.
>
> Some weeks ago I tried out the options "osd_fast_shutdown" and
> "osd_fast_shutdown_notify_mon" and also got slow ops while
> stopping/starting OSDs. But I didn't ceck if this triggert the
> problem with the read_leases or if this triggert my old issue

with the fast shutodnw.
>

Just to be clear, you should try
  osd_fast_shutdown = true
  osd_fast_shutdown_notify_mon = false

You write if the osd rejects messenger connections, because it is
> stopped, the peering process will skip the read_lease timeout. If the
> OSD annouces its shutdown, can we not skip this read_lease timeout as
> well?
>

If memory serves, yes, but the notify_mon process can take more time than a
peer OSD getting ECONNREFUSED.  The combination above is the recommended
combation (and the default).

> These days I will test the fast_shutdown switch again and will share the
> corresponding logs with you.
>

Thanks!
sage

>
>
> Viele Grüße aus Karlsruhe
> Manuel
>
>
> On Mon, 1 Nov 2021 15:55:35 -0500
> Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> > Hi Manuel,
> >
> > I'm looking at the ticket for this issue (
> > https://tracker.ceph.com/issues/51463) and tried to reproduce.  This
> > was initially trivial to do with vstart (rados bench paused for many
> > seconds afters stopping an osd) but it turns out that was because the
> > vstart ceph.conf includes `osd_fast_shutdown = false`.  Once I
> > enabled that again (as it is by default on a normal cluster) I did
> > not see any noticeable interruption when an OSD was stopped.
> >
> > Can you confirm what osd_fast_shutdown and
> > osd_fast_shutdown_notify_mon are set to on your cluster?
> >
> > The intent is that when an OSD goes down, it will no longer accept
> > messenger connection attempts, and peer OSDs will inform the monitor
> > with a flag indicating the OSD is definitely dead (vs slow or
> > unresponsive).  This will allow the peering process to skip waiting
> > for the read lease to time out.  If you're seeing the laggy or
> > 'waiting for readable' state, then that isn't happening.. probably
> > because the OSD shutdown isn't working as originally intended.
> >
> > If it's not one of those two options, make you can include a 'ceph
> > config dump' (or jsut a list of the changed values at least) so we
> > can see what else might be affecting OSD shutdown...
> >
> > Thanks!
> > sage
> >
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx