Re: RFC: (deep-)scrub manager module

Josh Salomon <jsalomon@xxxxxxxxxx> · Sun, 19 Jun 2022 10:04:14 +0300

I have one comment - I wouldn't use the word throttling but rather scheduling since we don't just want to limit scrubs we need some other policy. As long as we have a single PG scrub executing we can run scrubs on all the non-scrubbed OSD for free (since we already pay for the performance degradation) therefore we need a plan on how to execute as many scrubs simultaneously as long as all the OSDs are loaded evenly. For example, assume we have 100 OSDs and replica 3, we would like that when scrub runs we will have 33 PGs scrubbed simultaneously as long as no OSD appears in more than 1 PG so from OSD perspective 99 OSDs will execute scrub simultaneously (we can't get to 100 with 1 scrub only with 3 simultaneous scrubs per OSD).
Such a plan, with the other policies described (starting with the oldest scrubbed OSDs) should create an optimal plan when all the OSDs are symmetrical (same capacity and technology). Improving it for different capacities and technologies is an interesting exercise for future phases. 
One last point - we may want different priorities per pool (one pool requires weekly scrubs and another monthly scrubs), this should also be part of the scheduling algorithm.
Regards,
Josh

On Fri, Jun 17, 2022 at 12:05 PM Stefan Kooman <stefan@xxxxxx> wrote:
Hi All,

I would like to have your views, comments and ideas on the need for a 

(deep-)scrub module in the Ceph manager (mgr). Do we need a such a 

module at all?

What could such a scrub manager bring to the table?

- Centrally manage / coordinate scrubs. Without removing the current 

logic in the OSDs themselves. So it can act as a fall back for when the 

manager is not working for prolonged periods of time, in case of bugs, 

etc. Bonus: very Cephalopod like: "arms" take control when needed.

- Have PGs (deep-)scrubbed in a logical order (oldest timestamp gets 

(deep-)scrubbed first)

- Throttling: manage the amount of (deep-)scrubs that are allowed to 

take place at any given time

- Possibility of multiple time windows where (deep-)scrubs are allowed 

to take place (instead of only one as of now)

- Since Quincy [1], extra scrub related state information is available:

LAST_SCRUB_DURATION

SCRUB_SCHEDULING

OBJECTS_SCRUBBED

Together with existing PG information this opens the possibility to make 

a more accurate planning of the scrubs with some basic math. This can 

help reduce the impact on performance. The scheduling algorithm in this 

manager could inform the operator in time when the scrub deadlines would 

not be met, and suggest possible adjustment(s) the operator can make. 

Like increasing scrub window, max amounts of scrubs per OSD, decrease 

osd_scrub_sleep. In a "hands off" mode the cluster could make these 

adjustments all by itself: suitable for environments that do not have a 

(dedicated) Ceph operator to take care of these operational tasks. 

Ideally (utopy?) the manager would be aware of the impact of the 

(deep-)scrubs on client IO latency and act accordingly. But not sure if 

that is even needed when the new dmClock QoS scheduler [2] is active. So 

it would probably be wise no to optimize too early.

Please let me know what you think of this.

Gr. Stefan

[1]: https://docs.ceph.com/en/quincy/releases/quincy/

[2]: https://github.com/ceph/dmclock

_______________________________________________

Dev mailing list -- dev@xxxxxxx

To unsubscribe send an email to dev-leave@xxxxxxx

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx