It looks like the bug has been there since the read leases were introduced, which I believe was octopus (15.2.z) s On Thu, Nov 18, 2021 at 3:55 PM huxiaoyu@xxxxxxxxxxxx <huxiaoyu@xxxxxxxxxxxx> wrote: > May i ask, which versions are affected by this bug? and which versions are > going to receive backports? > > best regards, > > samuel > > ------------------------------ > huxiaoyu@xxxxxxxxxxxx > > > *From:* Sage Weil <sage@xxxxxxxxxxxx> > *Date:* 2021-11-18 22:02 > *To:* Manuel Lausch <manuel.lausch@xxxxxxxx>; ceph-users > <ceph-users@xxxxxxx> > *Subject:* Re: OSD spend too much time on "waiting for > readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart > Okay, good news: on the osd start side, I identified the bug (and easily > reproduced locally). The tracker and fix are: > > https://tracker.ceph.com/issues/53326 > https://github.com/ceph/ceph/pull/44015 > > These will take a while to work through QA and get backported. > > Also, to reiterate what I said on the call earlier today about the osd > stopping issues: > - A key piece of the original problem you were seeing was because > require_osd_release wasn't up to date, which meant that the the dead_epoch > metadata wasn't encoded in the OSDMap and we would basically *always* go > into the read lease wait when an OSD stopped. > - Now that that is fixed, it appears as though setting both > osd_fast_shutdown *and* osd_fast_shutdown_notify_mon is the winning > combination. > > I would be curious to hear if adjusting the icmp throttle kernel setting > makes things behave better when osd_fast_shutdown_notify_mon=false (the > default), but this is more out of curiosity--I think we've concluded that > we should set this option to true by default. > > If I'm missing anything, please let me know! > > Thanks for your patience in tracking this down. It's always a bit tricky > when there are multiple contributing factors (in this case, at least 3). > > sage > > > > On Tue, Nov 16, 2021 at 9:42 AM Sage Weil <sage@xxxxxxxxxxxx> wrote: > > > On Tue, Nov 16, 2021 at 8:30 AM Manuel Lausch <manuel.lausch@xxxxxxxx> > > wrote: > > > >> Hi Sage, > >> > >> its still the same cluster we talked about. I only upgraded it from > >> 16.2.5 to 16.2.6. > >> > >> I enabled fast shutdown again and did some tests with debug > >> logging enabled. > >> osd_fast_shutdown true > >> osd_fast_shutdown_notify_mon false > >> > >> The logs are here: > >> ceph-post-file: 59325568-719c-4ec9-b7ab-945244fcf8ae > >> > >> > >> I took 3 tests. > >> > >> First I stopped OSD 122 again at 14:22:40 and started it again at > >> 14:23:40. > >> stopping worked now without issue. But on starting I got 3 Slow > >> ops. > >> > >> Then at 14:25:00 I stopped all osds (systemctl stop ceph-osd.target) on > >> the host "csdeveubs-u02c01b01". Surprisingly there were no slow op as > >> well. But still on startup at 14:26:00 > >> > >> On 14:28:00 I stopped again all OSDs on host csdeveubs-u02c01b05. This > >> time I got some slow ops while stopping too. > >> > >> > >> So far as I understand, ceph skips the read lease time if a OSD is > >> "dead" but not if it is only down. This is because we do not know for > >> sure if a down OSD is realy gone and cannot answer reads anymore. right? > >> > > > > Exactly. > > > > > >> If a OSD annouces its shutdown to the mon the cluster marks it as > >> down. Can we not assume the deadness in this case as well? > >> Maybe this would help me in the stopping casse. > >> > > > > It could, but that's not how the shutdown process currently works. It > > requests that the mon mark it down, but continues servicing IO until it > is > > actually marked down. > > > > > >> The starting case will still be an issue. > > > > > > Yes. I suspect the root cause(s) there are a bit more complicated--I'll > > take a look at the logs today. > > > > Thanks! > > sage > > > > > > > >> > >> > >> > >> Thanks a lot > >> Manuel > >> > >> > >> > >> On Mon, 15 Nov 2021 17:32:24 -0600 > >> Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> > >> > Okay, I traced one slow op through the logs, and the problem was that > >> > the PG was laggy. That happened because of the osd.122 that you > >> > stopped, which was marked down in the OSDMap but *not* dead. It > >> > looks like that happened because the OSD took the 'clean shutdown' > >> > path instead of the fast stop. > >> > > >> > Have you tried enabling osd_fast_shutdown = true *after* you fixed the > >> > require_osd_release to octopus? It would have led to slow requests > >> > when you tested before because the new dead_epochfied in the OSDMap > >> > that the read leases rely on was not being encoded, making peering > >> > wait for the read lease to time out even though the stopped osd > >> > really died. > >> > > >> > I'm not entirely sure if this is the same cluster as the earlier > >> > one.. but given the logs you sent, my suggestion is to enable > >> > osd_fast_shutdown and try again. If you still get slow requests, can > >> > you capture the logs again? > >> > > >> > Thanks! > >> > sage > >> > > >> > >> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx