Re: RFC: (deep-)scrub manager module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have one comment - I wouldn't use the word throttling but rather scheduling since we don't just want to limit scrubs we need some other policy. As long as we have a single PG scrub executing we can run scrubs on all the non-scrubbed OSD for free (since we already pay for the performance degradation) therefore we need a plan on how to execute as many scrubs simultaneously as long as all the OSDs are loaded evenly. For example, assume we have 100 OSDs and replica 3, we would like that when scrub runs we will have 33 PGs scrubbed simultaneously as long as no OSD appears in more than 1 PG so from OSD perspective 99 OSDs will execute scrub simultaneously (we can't get to 100 with 1 scrub only with 3 simultaneous scrubs per OSD).
Such a plan, with the other policies described (starting with the oldest scrubbed OSDs) should create an optimal plan when all the OSDs are symmetrical (same capacity and technology). Improving it for different capacities and technologies is an interesting exercise for future phases. 
One last point - we may want different priorities per pool (one pool requires weekly scrubs and another monthly scrubs), this should also be part of the scheduling algorithm.
Regards,

Josh


On Fri, Jun 17, 2022 at 12:05 PM Stefan Kooman <stefan@xxxxxx> wrote:
Hi All,

I would like to have your views, comments and ideas on the need for a
(deep-)scrub module in the Ceph manager (mgr). Do we need a such a
module at all?

What could such a scrub manager bring to the table?

- Centrally manage / coordinate scrubs. Without removing the current
logic in the OSDs themselves. So it can act as a fall back for when the
manager is not working for prolonged periods of time, in case of bugs,
etc. Bonus: very Cephalopod like: "arms" take control when needed.
- Have PGs (deep-)scrubbed in a logical order (oldest timestamp gets
(deep-)scrubbed first)
- Throttling: manage the amount of (deep-)scrubs that are allowed to
take place at any given time
- Possibility of multiple time windows where (deep-)scrubs are allowed
to take place (instead of only one as of now)
- Since Quincy [1], extra scrub related state information is available:

LAST_SCRUB_DURATION
SCRUB_SCHEDULING
OBJECTS_SCRUBBED

Together with existing PG information this opens the possibility to make
a more accurate planning of the scrubs with some basic math. This can
help reduce the impact on performance. The scheduling algorithm in this
manager could inform the operator in time when the scrub deadlines would
not be met, and suggest possible adjustment(s) the operator can make.
Like increasing scrub window, max amounts of scrubs per OSD, decrease
osd_scrub_sleep. In a "hands off" mode the cluster could make these
adjustments all by itself: suitable for environments that do not have a
(dedicated) Ceph operator to take care of these operational tasks.
Ideally (utopy?) the manager would be aware of the impact of the
(deep-)scrubs on client IO latency and act accordingly. But not sure if
that is even needed when the new dmClock QoS scheduler [2] is active. So
it would probably be wise no to optimize too early.

Please let me know what you think of this.

Gr. Stefan

[1]: https://docs.ceph.com/en/quincy/releases/quincy/
[2]: https://github.com/ceph/dmclock

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux