Re: osd fast shutdown provokes slow requests

Manuel Lausch <manuel.lausch@xxxxxxxx> · Fri, 14 Aug 2020 10:08:07 +0200

Hi Dan,

stopping a single OSD took mostly 1 to 2 seconds betwenn stop and the
first reporting in ceph.log. Stopping a whole node, in this case 24
OSDs, in the most cases it took 5 to 7 seconds. After the reporting
peering begins, but this is quite fast.

Since I have the fast shutdown disabled. The "reporting down by itself"
messages appear more or less immediately, the cluster peers and all
works as expected and without trouble.

Manuel

On Thu, 13 Aug 2020 16:45:20 +0200
Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:

> OK I just wanted to confirm you hadn't extended the
> osd_heartbeat_grace or similar.
> 
> On your large cluster, what is the time from stopping an osd (with
> fasst shutdown enabled) to:
>    cluster [DBG] osd.317 reported immediately failed by osd.202
> 
> -- dan
> 
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx