Re: Deep Scrub distribution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm pretty sure I put up one of those scripts in the past.  Basically what we did was we set our scrub cycle to something like 40 days, we then sort all PGs by the last time they were deep scrubbed.  We grab the oldest 1/30 of those PGs and tell them to deep-scrub manually, the next day we do it again.  After a month or so, your PGs should be fairly evenly spaced out over 30 days.  With those numbers you could disable the cron to run the deep-scrubs for maintenance up to 10 days every 40 days and still scrub all of your PGs during that time.

On Mon, Mar 5, 2018 at 2:00 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
On Mon, Mar 5, 2018 at 9:56 AM Jonathan D. Proulx <jon@xxxxxxxxxxxxx> wrote:
Hi All,

I've recently noticed my deep scrubs are EXTREAMLY poorly
distributed.  They are stating with in the 18->06 local time start
stop time but are not distrubuted over enough days or well distributed
over the range of days they have.

root@ceph-mon0:~# for date in `ceph pg dump | awk '/active/{print $20}'`; do date +%D -d $date; done | sort | uniq -c
dumped all
      1 03/01/18
      6 03/03/18
   8358 03/04/18
   1875 03/05/18

So very nearly all 10240 pgs scrubbed lastnight/this morning.  I've
been kicking this around for a while since I noticed poor distribution
over a 7 day range when I was really pretty sure I'd changed that from
the 7d default to 28d.

Tried kicking it out to 42 days about a week ago with:

ceph tell osd.* injectargs '--osd_deep_scrub_interval 3628800'


There were many error suggesting it could nto reread the change and I'd
need to restart the OSDs but 'ceph daemon osd.0 config show |grep
osd_deep_scrub_interval' showed the right value so I let it roll for a
week but the scrubs did not spread out.

So Friday I set that value in ceph.conf and did rolling restarts of
all OSDs.  Then doubled checked running value on all daemons.
Checking Sunday the nightly deeps scrubs (based on LAST_DEEP_SCRUB
voodoo above) show near enough 1/42nd of PGs had been scrubbed
Saturday night that I thought this was working.

This morning I checked again and got the results above.

I would expect after changing to a 42d scrub cycle I'd see approx 1/42
of the PGs deep scrub each night untill there was a roughly even
distribution over the past 42 days.

So which thing is broken my config or my expectations?

Sadly, changing the interval settings does not directly change the scheduling of deep scrubs. Instead, it merely influences whether a PG will get queued for scrub when it is examined as a candidate, based on how out-of-date its scrub is. (That is, nothing holistically goes "I need to scrub 1/n of these PGs every night"; there's a simple task that says "is this PG's last scrub more than n days old?")

Users have shared various scripts on the list for setting up a more even scrub distribution by fiddling with the settings and poking at specific PGs to try and smear them out over the whole time period; I'd check archives or google for those. :)
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux