Re: auto stop of scrubbing and deep scrubbing while backfilling or recovering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Op 8 november 2016 om 15:19 schreef Sage Weil <sage@xxxxxxxxxxxx>:
> 
> 
> On Tue, 8 Nov 2016, Wido den Hollander wrote:
> > > Op 8 november 2016 om 9:35 schreef Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>:
> > > 
> > > 
> > > Hello,
> > > 
> > > i'm wondering if anybody has already thought about automatically
> > > stopping srub and deep-scrub in case of backfilling or recovering. I've
> > > seen several situations where scrubbing massivly raises the latency
> > > while doing backfilling or recovering.
> > > 
> > 
> > Seems like a sane change to me, but maybe a dev has a better option. I 
> > don't think a stop is easy, but a 'noscrub' flag could be set inside the 
> > OSD.
> > 
> > Maybe a config option: osd_scrub_during_recovery
> > 
> > Defaults to true, but can be set to false by the admin.
> > 
> > Before a scrub starts the OSD will check if there is recovery / 
> > backfilling active on the OSD and if so it will not initiate the scrub.
> 
> Yeah, it seems reasonable.  I think there are two basic options:
> 
> - Disable scrubbing locally on each OSD if it has scrubbing PGs.  Two 
> unrelated OSDs would be free to scrub and backfill at the same time.
> 
> - Disable scrubbing globally if any pgs are backfilling.  The reasoning 
> here is that if backfilling is increasing the latency on some PGs, we 
> don't want to increase the latency on others (by scrubbing) too.
> 
> The other consideration is that if backfil is happening it probably 
> doesn't mean we want to prevent scrubbing indefinitely.  Instead, I'd 
> suggest increasing the scrub intervals by some factor (e.g., 2x).
> 
> The first option would probably be a change in the scrub scheduling in 
> the OSD.
> 

I would go for the first one. Imagine a large cluster where one backfill is busy, that would otherwise halt all scrubs while only a few OSDs are involved.

Option one isn't that hard to implement either I think.

Wido

> The latter could be accomplished by having the mon publish a scrub 
> interval scaling factor in the OSDMap...
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux