Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 4 Nov 2021 10:18:33 -0700

On Tue, Nov 2, 2021 at 7:03 AM Sage Weil <sage@xxxxxxxxxxxx> wrote:

> On Tue, Nov 2, 2021 at 8:29 AM Manuel Lausch <manuel.lausch@xxxxxxxx>
> wrote:
>
> > Hi Sage,
> >
> > The "osd_fast_shutdown" is set to "false"
> > As we upgraded to luminous I also had blocked IO issuses with this
> > enabled.
> >
> > Some weeks ago I tried out the options "osd_fast_shutdown" and
> > "osd_fast_shutdown_notify_mon" and also got slow ops while
> > stopping/starting OSDs. But I didn't ceck if this triggert the
> > problem with the read_leases or if this triggert my old issue
>
> with the fast shutodnw.
> >
>
> Just to be clear, you should try
>   osd_fast_shutdown = true
>   osd_fast_shutdown_notify_mon = false
>
> You write if the osd rejects messenger connections, because it is
> > stopped, the peering process will skip the read_lease timeout. If the
> > OSD annouces its shutdown, can we not skip this read_lease timeout as
> > well?
> >
>
> If memory serves, yes, but the notify_mon process can take more time than a
> peer OSD getting ECONNREFUSED.  The combination above is the recommended
> combation (and the default).

Hmmm, if the OSDs are detecting shutdown based on networking error codes,
could a networking configuration or security switch prevent them from
seeing the “correct” failure result?
-Greg

>
>
> > These days I will test the fast_shutdown switch again and will share the
> > corresponding logs with you.
> >
>
> Thanks!
> sage
>
>
>
> >
> >
> > Viele Grüße aus Karlsruhe
> > Manuel
> >
> >
> > On Mon, 1 Nov 2021 15:55:35 -0500
> > Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >
> > > Hi Manuel,
> > >
> > > I'm looking at the ticket for this issue (
> > > https://tracker.ceph.com/issues/51463) and tried to reproduce.  This
> > > was initially trivial to do with vstart (rados bench paused for many
> > > seconds afters stopping an osd) but it turns out that was because the
> > > vstart ceph.conf includes `osd_fast_shutdown = false`.  Once I
> > > enabled that again (as it is by default on a normal cluster) I did
> > > not see any noticeable interruption when an OSD was stopped.
> > >
> > > Can you confirm what osd_fast_shutdown and
> > > osd_fast_shutdown_notify_mon are set to on your cluster?
> > >
> > > The intent is that when an OSD goes down, it will no longer accept
> > > messenger connection attempts, and peer OSDs will inform the monitor
> > > with a flag indicating the OSD is definitely dead (vs slow or
> > > unresponsive).  This will allow the peering process to skip waiting
> > > for the read lease to time out.  If you're seeing the laggy or
> > > 'waiting for readable' state, then that isn't happening.. probably
> > > because the OSD shutdown isn't working as originally intended.
> > >
> > > If it's not one of those two options, make you can include a 'ceph
> > > config dump' (or jsut a list of the changed values at least) so we
> > > can see what else might be affecting OSD shutdown...
> > >
> > > Thanks!
> > > sage
> > >
> >
> >
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx