Hi everyone, Just to make sure everyone reading this thread gets the info, setting osd_scrub_disable_reservation_queuing to 'true' is a temporary workaround, as confirmed by Laimis on the tracker [1]. Cheers, Frédéric. [1] https://tracker.ceph.com/issues/69078 ----- Le 5 Déc 24, à 23:09, Laimis Juzeliūnas laimis.juzeliunas@xxxxxxxxxx a écrit : > Hi all, > > Just came back from this years Cephalocon and managed to get a quick chat with > Ronen regarding this issue. He had a great presentation[1, 2] on the upcoming > changes to scrubbing in Tentacle as well as some changes already made in Squid > release. > The primary suspect here is the mclock scheduler and the way replica > reservations are made with since 19.2.0. Regular scrubs begin by the primary > requesting all acting-set replicas to allow the scrub to continue, each replica > either grants the request immediately or queues it. As I understand previous > releases instead of queuing would send a simple deny on the spot in case of > thinned resources (that happens when the scrub map is asked for from the acting > set members, but I might be wrong). For some reason with mclock this can lead > to acting sets constantly queuing these scrub requests and never actually > completing. > As for the configuraiton goes: in Squid osd_scrub_cost config that has been > increased to 52428800 for some reason. I'm having a hard time finding previous > values but [3] redhat docs have this value set at 50 << 20. Unless the whole > logic/calculation has changed such an abyssmal value will simply never allow > resources to be granted with mclock. > Another suspect is osd_scrub_event_cost which has been set to 4096. Once again > having a hard time to find any previous version values for it to compare. > > One thing we've found that there is now a config option > osd_scrub_disable_reservation_queuing (default - false): "When set - scrub > replica reservations are responded to immediately, with either success or > failure (the pre-Squid version behaviour). This configuration option is > introduced to support mixed-version clusters and debugging, and will be removed > in the next release." My guess is that setting this to true would simply return > scrubbing options back to Reef and previous releases. > > To keep all the work done with scrubbing changes in place we will try reducing > osd_scrub_cost to a much lower value (50 or even less) and check if that helps > our case. If not, we will reduce osd_scrub_event_cost as well as we're not sure > at this point which one of these have the direct impact. > If that wont help we will have to set osd_scrub_disable_reservation_queuing to > true, but that will leave us simply with an old way scrubs are done (not cool - > we want the fancy new way). If that wont help we will have to start thinking of > switching to wpq instead of mclock, which is also not that cool looking into > the future of Ceph. > > I'll keep the mailing list (and tracker) updated with our findings. > > Best, > Laimis J. > > > 1 - > https://ceph2024.sched.com/event/1ktWh/the-scrub-type-to-limitations-matrix-ronen-friedman-ibm > 2 - https://static.sched.com/hosted_files/ceph2024/08/ceph24_main%20%284%29.pdf > 3 - > https://docs.redhat.com/en/documentation/red_hat_ceph_storage/2/html/configuration_guide/osd_configuration_reference#scrubbing > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx