[ceph-users] Re: RFC: (deep-)scrub manager module

Rasha Shoaib <rshoaib@xxxxxxxxxxx> · Wed, 6 Jul 2022 14:42:37 +0200

Hi all,

Thanks for opening this discussion,
Let me share with you some thoughts..

We discussed this in PetaSAN project a while ago, after getting 
complaints concerning pgs not deep scrubbed in time.

The main question was whether Ceph should be responsible to finish 
scrubbing in the specified interval (or at least try to do),,

or just deep scrub when possible according to the settings specified and 
give a warning when pgs don’t finish scrubbing in that time.

Ceph currently adopts the 2nd option, making it up to the user to 
choose/guess the best values for some options like osd_scrub_sleep, 
randomize_ratio and load_threshold, which is so tricky!

Consequently, we chose to make a daemon that tries to make use of all 
existing resources to finish deep scrub in time, this includes ensuring 
that we have one deep scrub running per each osd (only rare osds may be 
free if num of osds % replicas != 0 , as Josh explained) , and starting 
with the oldest deep-scrub timestamp, along with dynamically increasing 
and decreasing the osd_scrub_sleep on the run, depending on the 
statistics from the last bunch of scrubbed pgs, to make sure we are not 
overloading the cluster while we have plenty of time left, and we are 
not too slow too..

What I wished to have in the new version; is letting scrub-sleep 
configurable per pool, just like deep_scrub_interval, scrub_priority and 
other scrub options, I think this is reasonable as we may want different 
pools to operate with different speeds, and the scrub-sleep is the only 
way to slow down the scrubbing process and give more chance to client 
i/o during the scrub of a single pg.

I also prefer to get this scheduling functionality embedded in Ceph, and 
i believe it should be at a level higher than the OSD, (good to be in 
the mgr as Stefan suggested), to be able to adjust the scrub settings 
depending on the scrub status per pool/cluster.

I'm afraid you are adding many arguments and special cases/wrap-around 
to handle issues which will be automatically solved if scheduling was 
delegated to a central module, for example, the need to << marking 
"urgent" scrubs in the "replica - I need your resources" >> as Ronen 
mentioned , will not exist if the module requesting scrubbing is aware 
of the big picture and process requests according to the priorities order.

Please, reconsider the "community request" to have a dedicated scrubbing 
module/daemon, it may need some effort but it worths it..

Thanks and Regards,

--
Rasha Shoaib
Software Architect
PetaSAN
www.petasan.org

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx