There's a bit of discussion on this at the original PR: https://github.com/ceph/ceph/pull/31677 Sage claims the IO interruption should be smaller with osd_fast_shutdown than without. -- dan On Fri, Aug 14, 2020 at 10:08 AM Manuel Lausch <manuel.lausch@xxxxxxxx> wrote: > > Hi Dan, > > stopping a single OSD took mostly 1 to 2 seconds betwenn stop and the > first reporting in ceph.log. Stopping a whole node, in this case 24 > OSDs, in the most cases it took 5 to 7 seconds. After the reporting > peering begins, but this is quite fast. > > Since I have the fast shutdown disabled. The "reporting down by itself" > messages appear more or less immediately, the cluster peers and all > works as expected and without trouble. > > > Manuel > > > > On Thu, 13 Aug 2020 16:45:20 +0200 > Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > > OK I just wanted to confirm you hadn't extended the > > osd_heartbeat_grace or similar. > > > > On your large cluster, what is the time from stopping an osd (with > > fasst shutdown enabled) to: > > cluster [DBG] osd.317 reported immediately failed by osd.202 > > > > -- dan > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx