Hi all,
Thanks for opening this discussion,
Let me share with you some thoughts..
We discussed this in PetaSAN project a while ago, after getting
complaints concerning pgs not deep scrubbed in time.
The main question was whether Ceph should be responsible to finish
scrubbing in the specified interval (or at least try to do),,
or just deep scrub when possible according to the settings specified and
give a warning when pgs don’t finish scrubbing in that time.
Ceph currently adopts the 2nd option, making it up to the user to
choose/guess the best values for some options like osd_scrub_sleep,
randomize_ratio and load_threshold, which is so tricky!
Consequently, we chose to make a daemon that tries to make use of all
existing resources to finish deep scrub in time, this includes ensuring
that we have one deep scrub running per each osd (only rare osds may be
free if num of osds % replicas != 0 , as Josh explained) , and starting
with the oldest deep-scrub timestamp, along with dynamically increasing
and decreasing the osd_scrub_sleep on the run, depending on the
statistics from the last bunch of scrubbed pgs, to make sure we are not
overloading the cluster while we have plenty of time left, and we are
not too slow too..
What I wished to have in the new version; is letting scrub-sleep
configurable per pool, just like deep_scrub_interval, scrub_priority and
other scrub options, I think this is reasonable as we may want different
pools to operate with different speeds, and the scrub-sleep is the only
way to slow down the scrubbing process and give more chance to client
i/o during the scrub of a single pg.
I also prefer to get this scheduling functionality embedded in Ceph, and
i believe it should be at a level higher than the OSD, (good to be in
the mgr as Stefan suggested), to be able to adjust the scrub settings
depending on the scrub status per pool/cluster.
I'm afraid you are adding many arguments and special cases/wrap-around
to handle issues which will be automatically solved if scheduling was
delegated to a central module, for example, the need to << marking
"urgent" scrubs in the "replica - I need your resources" >> as Ronen
mentioned , will not exist if the module requesting scrubbing is aware
of the big picture and process requests according to the priorities order.
Please, reconsider the "community request" to have a dedicated scrubbing
module/daemon, it may need some effort but it worths it..
Thanks and Regards,
--
Rasha Shoaib
Software Architect
PetaSAN
www.petasan.org
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx