Nice. Just now I building a 16.2.6 relese with this patch and will test it. Thanks, Manuel On Thu, 18 Nov 2021 15:02:38 -0600 Sage Weil <sage@xxxxxxxxxxxx> wrote: > Okay, good news: on the osd start side, I identified the bug (and easily > reproduced locally). The tracker and fix are: > > https://tracker.ceph.com/issues/53326 > https://github.com/ceph/ceph/pull/44015 > > These will take a while to work through QA and get backported. > > Also, to reiterate what I said on the call earlier today about the osd > stopping issues: > - A key piece of the original problem you were seeing was because > require_osd_release wasn't up to date, which meant that the the dead_epoch > metadata wasn't encoded in the OSDMap and we would basically *always* go > into the read lease wait when an OSD stopped. > - Now that that is fixed, it appears as though setting both > osd_fast_shutdown *and* osd_fast_shutdown_notify_mon is the winning > combination. > > I would be curious to hear if adjusting the icmp throttle kernel setting > makes things behave better when osd_fast_shutdown_notify_mon=false (the > default), but this is more out of curiosity--I think we've concluded that > we should set this option to true by default. > > If I'm missing anything, please let me know! > > Thanks for your patience in tracking this down. It's always a bit tricky > when there are multiple contributing factors (in this case, at least 3). > > sage > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx