Will it be available in 15.2.16? Istvan Szabo Senior Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx> --------------------------------------------------- On 2021. Nov 18., at 23:12, Sage Weil <sage@xxxxxxxxxxxx> wrote: Email received from the internet. If in doubt, don't click any link nor open any attachment ! ________________________________ It looks like the bug has been there since the read leases were introduced, which I believe was octopus (15.2.z) s On Thu, Nov 18, 2021 at 3:55 PM huxiaoyu@xxxxxxxxxxxx <huxiaoyu@xxxxxxxxxxxx> wrote: May i ask, which versions are affected by this bug? and which versions are going to receive backports? best regards, samuel ------------------------------ huxiaoyu@xxxxxxxxxxxx *From:* Sage Weil <sage@xxxxxxxxxxxx> *Date:* 2021-11-18 22:02 *To:* Manuel Lausch <manuel.lausch@xxxxxxxx>; ceph-users <ceph-users@xxxxxxx> *Subject:* Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart Okay, good news: on the osd start side, I identified the bug (and easily reproduced locally). The tracker and fix are: https://tracker.ceph.com/issues/53326 https://github.com/ceph/ceph/pull/44015 These will take a while to work through QA and get backported. Also, to reiterate what I said on the call earlier today about the osd stopping issues: - A key piece of the original problem you were seeing was because require_osd_release wasn't up to date, which meant that the the dead_epoch metadata wasn't encoded in the OSDMap and we would basically *always* go into the read lease wait when an OSD stopped. - Now that that is fixed, it appears as though setting both osd_fast_shutdown *and* osd_fast_shutdown_notify_mon is the winning combination. I would be curious to hear if adjusting the icmp throttle kernel setting makes things behave better when osd_fast_shutdown_notify_mon=false (the default), but this is more out of curiosity--I think we've concluded that we should set this option to true by default. If I'm missing anything, please let me know! Thanks for your patience in tracking this down. It's always a bit tricky when there are multiple contributing factors (in this case, at least 3). sage On Tue, Nov 16, 2021 at 9:42 AM Sage Weil <sage@xxxxxxxxxxxx> wrote: On Tue, Nov 16, 2021 at 8:30 AM Manuel Lausch <manuel.lausch@xxxxxxxx> wrote: Hi Sage, its still the same cluster we talked about. I only upgraded it from 16.2.5 to 16.2.6. I enabled fast shutdown again and did some tests with debug logging enabled. osd_fast_shutdown true osd_fast_shutdown_notify_mon false The logs are here: ceph-post-file: 59325568-719c-4ec9-b7ab-945244fcf8ae I took 3 tests. First I stopped OSD 122 again at 14:22:40 and started it again at 14:23:40. stopping worked now without issue. But on starting I got 3 Slow ops. Then at 14:25:00 I stopped all osds (systemctl stop ceph-osd.target) on the host "csdeveubs-u02c01b01". Surprisingly there were no slow op as well. But still on startup at 14:26:00 On 14:28:00 I stopped again all OSDs on host csdeveubs-u02c01b05. This time I got some slow ops while stopping too. So far as I understand, ceph skips the read lease time if a OSD is "dead" but not if it is only down. This is because we do not know for sure if a down OSD is realy gone and cannot answer reads anymore. right? Exactly. If a OSD annouces its shutdown to the mon the cluster marks it as down. Can we not assume the deadness in this case as well? Maybe this would help me in the stopping casse. It could, but that's not how the shutdown process currently works. It requests that the mon mark it down, but continues servicing IO until it is actually marked down. The starting case will still be an issue. Yes. I suspect the root cause(s) there are a bit more complicated--I'll take a look at the logs today. Thanks! sage Thanks a lot Manuel On Mon, 15 Nov 2021 17:32:24 -0600 Sage Weil <sage@xxxxxxxxxxxx> wrote: Okay, I traced one slow op through the logs, and the problem was that the PG was laggy. That happened because of the osd.122 that you stopped, which was marked down in the OSDMap but *not* dead. It looks like that happened because the OSD took the 'clean shutdown' path instead of the fast stop. Have you tried enabling osd_fast_shutdown = true *after* you fixed the require_osd_release to octopus? It would have led to slow requests when you tested before because the new dead_epochfied in the OSDMap that the read leases rely on was not being encoded, making peering wait for the read lease to time out even though the stopped osd really died. I'm not entirely sure if this is the same cluster as the earlier one.. but given the logs you sent, my suggestion is to enable osd_fast_shutdown and try again. If you still get slow requests, can you capture the logs again? Thanks! sage _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx