RFC: (deep-)scrub manager module

Stefan Kooman <stefan@xxxxxx> · Fri, 17 Jun 2022 11:02:51 +0200

Hi All,

I would like to have your views, comments and ideas on the need for a 
(deep-)scrub module in the Ceph manager (mgr). Do we need a such a 
module at all?

What could such a scrub manager bring to the table?

- Centrally manage / coordinate scrubs. Without removing the current 
logic in the OSDs themselves. So it can act as a fall back for when the 
manager is not working for prolonged periods of time, in case of bugs, 
etc. Bonus: very Cephalopod like: "arms" take control when needed.
- Have PGs (deep-)scrubbed in a logical order (oldest timestamp gets 
(deep-)scrubbed first)
- Throttling: manage the amount of (deep-)scrubs that are allowed to 
take place at any given time
- Possibility of multiple time windows where (deep-)scrubs are allowed 
to take place (instead of only one as of now)
- Since Quincy [1], extra scrub related state information is available:

LAST_SCRUB_DURATION
SCRUB_SCHEDULING
OBJECTS_SCRUBBED

Together with existing PG information this opens the possibility to make 
a more accurate planning of the scrubs with some basic math. This can 
help reduce the impact on performance. The scheduling algorithm in this 
manager could inform the operator in time when the scrub deadlines would 
not be met, and suggest possible adjustment(s) the operator can make. 
Like increasing scrub window, max amounts of scrubs per OSD, decrease 
osd_scrub_sleep. In a "hands off" mode the cluster could make these 
adjustments all by itself: suitable for environments that do not have a 
(dedicated) Ceph operator to take care of these operational tasks. 
Ideally (utopy?) the manager would be aware of the impact of the 
(deep-)scrubs on client IO latency and act accordingly. But not sure if 
that is even needed when the new dmClock QoS scheduler [2] is active. So 
it would probably be wise no to optimize too early.

Please let me know what you think of this.

Gr. Stefan

[1]: https://docs.ceph.com/en/quincy/releases/quincy/
[2]: https://github.com/ceph/dmclock

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx