[ceph-users] Re: RFC: (deep-)scrub manager module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Thanks for opening this discussion,
Let me share with you some thoughts..

We discussed this in PetaSAN project a while ago, after getting complaints concerning pgs not deep scrubbed in time.

The main question was whether Ceph should be responsible to finish scrubbing in the specified interval (or at least try to do),,

or just deep scrub when possible according to the settings specified and give a warning when pgs don’t finish scrubbing in that time.

Ceph currently adopts the 2nd option, making it up to the user to choose/guess the best values for some options like osd_scrub_sleep, randomize_ratio and load_threshold, which is so tricky!


Consequently, we chose to make a daemon that tries to make use of all existing resources to finish deep scrub in time, this includes ensuring that we have one deep scrub running per each osd (only rare osds may be free if num of osds % replicas != 0 , as Josh explained) , and starting with the oldest deep-scrub timestamp, along with dynamically increasing and decreasing the osd_scrub_sleep on the run, depending on the statistics from the last bunch of scrubbed pgs, to make sure we are not overloading the cluster while we have plenty of time left, and we are not too slow too..

What I wished to have in the new version; is letting scrub-sleep configurable per pool, just like deep_scrub_interval, scrub_priority and other scrub options, I think this is reasonable as we may want different pools to operate with different speeds, and the scrub-sleep is the only way to slow down the scrubbing process and give more chance to client i/o during the scrub of a single pg.

I also prefer to get this scheduling functionality embedded in Ceph, and i believe it should be at a level higher than the OSD, (good to be in the mgr as Stefan suggested), to be able to adjust the scrub settings depending on the scrub status per pool/cluster.

I'm afraid you are adding many arguments and special cases/wrap-around to handle issues which will be automatically solved if scheduling was delegated to a central module, for example, the need to << marking "urgent" scrubs in the "replica - I need your resources" >> as Ronen mentioned , will not exist if the module requesting scrubbing is aware of the big picture and process requests according to the priorities order.

Please, reconsider the "community request" to have a dedicated scrubbing module/daemon, it may need some effort but it worths it..


Thanks and Regards,

--
Rasha Shoaib
Software Architect
PetaSAN
www.petasan.org

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux