I would like to have your views, comments and ideas on the need for a
(deep-)scrub module in the Ceph manager (mgr). Do we need a such a
module at all?
What could such a scrub manager bring to the table?
- Centrally manage / coordinate scrubs. Without removing the current
logic in the OSDs themselves. So it can act as a fall back for when the
manager is not working for prolonged periods of time, in case of bugs,
etc. Bonus: very Cephalopod like: "arms" take control when needed.
- Have PGs (deep-)scrubbed in a logical order (oldest timestamp gets
- Throttling: manage the amount of (deep-)scrubs that are allowed to
take place at any given time
- Possibility of multiple time windows where (deep-)scrubs are allowed
to take place (instead of only one as of now)
- Since Quincy , extra scrub related state information is available:
Together with existing PG information this opens the possibility to make
a more accurate planning of the scrubs with some basic math. This can
help reduce the impact on performance. The scheduling algorithm in this
manager could inform the operator in time when the scrub deadlines would
not be met, and suggest possible adjustment(s) the operator can make.
Like increasing scrub window, max amounts of scrubs per OSD, decrease
osd_scrub_sleep. In a "hands off" mode the cluster could make these
adjustments all by itself: suitable for environments that do not have a
(dedicated) Ceph operator to take care of these operational tasks.
Ideally (utopy?) the manager would be aware of the impact of the
(deep-)scrubs on client IO latency and act accordingly. But not sure if
that is even needed when the new dmClock QoS scheduler  is active. So
it would probably be wise no to optimize too early.
Please let me know what you think of this.
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx