Re: how to list and reset the scrub schedules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Tue, Jul 18, 2017 at 1:19 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
On Fri, Jul 14, 2017 at 10:40 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> On Fri, Jul 14, 2017 at 5:41 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>>
>> Hi,
>>
>> Occasionally we want to change the scrub schedule for a pool or whole
>> cluster, but we want to do this by injecting new settings without
>> restarting every daemon.
>>
>> I've noticed that in jewel, changes to scrub_min/max_interval and
>> deep_scrub_interval do not take immediate effect, presumably because
>> the scrub schedules are calculated in advance for all the PGs on an
>> OSD.
>>
>> Does anyone know how to list that scrub schedule for a given OSD?
>
>
> I'm not aware of any "scrub schedule" as such, just the constraints around
> when new scrubbing happens. What exactly were you doing previously that
> isn't working now?

Take this for example:

2017-07-18 10:03:30.600486 7f02f7a54700 20 osd.1 123582
scrub_random_backoff lost coin flip, randomly backing off
2017-07-18 10:03:31.600558 7f02f7a54700 20 osd.1 123582
can_inc_scrubs_pending0 -> 1 (max 1, active 0)
2017-07-18 10:03:31.600565 7f02f7a54700 20 osd.1 123582
scrub_time_permit should run between 0 - 24 now 10 = yes
2017-07-18 10:03:31.600592 7f02f7a54700 20 osd.1 123582
scrub_load_below_threshold loadavg 0.85 < max 5 = yes
2017-07-18 10:03:31.600603 7f02f7a54700 20 osd.1 123582 sched_scrub
load_is_low=1
2017-07-18 10:03:31.600605 7f02f7a54700 30 osd.1 123582 sched_scrub
examine 38.127 at 2017-07-18 10:08:01.148612
2017-07-18 10:03:31.600608 7f02f7a54700 10 osd.1 123582 sched_scrub
38.127 scheduled at 2017-07-18 10:08:01.148612 > 2017-07-18
10:03:31.600562
2017-07-18 10:03:31.600611 7f02f7a54700 20 osd.1 123582 sched_scrub done

PG 38.127 is the next registered scrub on osd.1. AFAICT, "registered"
means that there exists a ScrubJob for this PG, with a sched_time
(time of the last scrub + a random interval) and a deadline (time of
the last scrub + scrub max interval)

(Question: how many scrubs are registered at a given time on an OSD?
Just this one that is printed in the tick loop, or several?)

Anyway, I decrease the min and max scrub intervals for that pool,
hoping to make it scrub right away:

# ceph osd pool set testing-images scrub_min_interval 60 set pool 38
scrub_min_interval to 60
set pool 38 scrub_min_interval to 60
# ceph osd pool set testing-images scrub_max_interval 86400
set pool 38 scrub_max_interval to 86400


But the registered ScrubJob doesn't change -- what I called the "scrub
schedule" doesn't change:

2017-07-18 10:06:53.622286 7f02f7a54700 20 osd.1 123584
scrub_random_backoff lost coin flip, randomly backing off
2017-07-18 10:06:54.622403 7f02f7a54700 20 osd.1 123584
can_inc_scrubs_pending0 -> 1 (max 1, active 0)
2017-07-18 10:06:54.622409 7f02f7a54700 20 osd.1 123584
scrub_time_permit should run between 0 - 24 now 10 = yes
2017-07-18 10:06:54.622436 7f02f7a54700 20 osd.1 123584
scrub_load_below_threshold loadavg 1.16 < max 5 = yes
2017-07-18 10:06:54.622446 7f02f7a54700 20 osd.1 123584 sched_scrub
load_is_low=1
2017-07-18 10:06:54.622449 7f02f7a54700 30 osd.1 123584 sched_scrub
examine 38.127 at 2017-07-18 10:08:01.148612
2017-07-18 10:06:54.622452 7f02f7a54700 10 osd.1 123584 sched_scrub
38.127 scheduled at 2017-07-18 10:08:01.148612 > 2017-07-18
10:06:54.622408
2017-07-18 10:06:54.622455 7f02f7a54700 20 osd.1 123584 sched_scrub done


I'm looking for a way to reset those registered scrubs, so that the
new intervals can take effect (without restarting OSDs).


Unfortunately there's not a good way to manually reschedule scrubbing that I can see. That would be a good ticket!

It *does* unregister the existing ScrubJob when it starts peering the PG, and registers a new ScrubJob when the PG goes active. So if you've got a good way to induce one of those you don't technically need to restart OSDs. I can't off-hand think of a good way to do that without doing something that's at least as disruptive as a polite OSD restart though.
-Greg

 

Cheers, Dan

>
>>
>>
>> And better yet, does anyone know a way to reset that schedule, so that
>> the OSD generates a new one with the new configuration?
>>
>> (I've noticed that by chance setting sortbitwise triggers many scrubs
>> -- maybe a new peering interval resets the scrub schedules?) Any
>> non-destructive way to trigger a new peering interval on demand?
>>
>> Cheers,
>>
>> Dan
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux