Re: osd fast shutdown provokes slow requests

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Fri, 14 Aug 2020 11:03:37 +0200

There's a bit of discussion on this at the original PR:
https://github.com/ceph/ceph/pull/31677
Sage claims the IO interruption should be smaller with
osd_fast_shutdown than without.

-- dan

On Fri, Aug 14, 2020 at 10:08 AM Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:
>
> Hi Dan,
>
> stopping a single OSD took mostly 1 to 2 seconds betwenn stop and the
> first reporting in ceph.log. Stopping a whole node, in this case 24
> OSDs, in the most cases it took 5 to 7 seconds. After the reporting
> peering begins, but this is quite fast.
>
> Since I have the fast shutdown disabled. The "reporting down by itself"
> messages appear more or less immediately, the cluster peers and all
> works as expected and without trouble.
>
>
> Manuel
>
>
>
> On Thu, 13 Aug 2020 16:45:20 +0200
> Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> > OK I just wanted to confirm you hadn't extended the
> > osd_heartbeat_grace or similar.
> >
> > On your large cluster, what is the time from stopping an osd (with
> > fasst shutdown enabled) to:
> >    cluster [DBG] osd.317 reported immediately failed by osd.202
> >
> > -- dan
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx