Re: auto stop of scrubbing and deep scrubbing while backfilling or recovering

Wido den Hollander <wido@xxxxxxxx> · Sat, 12 Nov 2016 15:40:03 +0100 (CET)

> Op 9 november 2016 om 16:19 schreef Wido den Hollander <wido@xxxxxxxx>:
> 
> 
> 
> > Op 8 november 2016 om 21:30 schreef Sage Weil <sage@xxxxxxxxxxxx>:
> > 
> > 
> > On Tue, 8 Nov 2016, Wido den Hollander wrote:
> > > > Op 8 november 2016 om 15:19 schreef Sage Weil <sage@xxxxxxxxxxxx>:
> > > > 
> > > > 
> > > > On Tue, 8 Nov 2016, Wido den Hollander wrote:
> > > > > > Op 8 november 2016 om 9:35 schreef Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>:
> > > > > > 
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > i'm wondering if anybody has already thought about automatically
> > > > > > stopping srub and deep-scrub in case of backfilling or recovering. I've
> > > > > > seen several situations where scrubbing massivly raises the latency
> > > > > > while doing backfilling or recovering.
> > > > > > 
> > > > > 
> > > > > Seems like a sane change to me, but maybe a dev has a better option. I 
> > > > > don't think a stop is easy, but a 'noscrub' flag could be set inside the 
> > > > > OSD.
> > > > > 
> > > > > Maybe a config option: osd_scrub_during_recovery
> > > > > 
> > > > > Defaults to true, but can be set to false by the admin.
> > > > > 
> > > > > Before a scrub starts the OSD will check if there is recovery / 
> > > > > backfilling active on the OSD and if so it will not initiate the scrub.
> > > > 
> > > > Yeah, it seems reasonable.  I think there are two basic options:
> > > > 
> > > > - Disable scrubbing locally on each OSD if it has scrubbing PGs.  Two 
> > > > unrelated OSDs would be free to scrub and backfill at the same time.
> > > > 
> > > > - Disable scrubbing globally if any pgs are backfilling.  The reasoning 
> > > > here is that if backfilling is increasing the latency on some PGs, we 
> > > > don't want to increase the latency on others (by scrubbing) too.
> > > > 
> > > > The other consideration is that if backfil is happening it probably 
> > > > doesn't mean we want to prevent scrubbing indefinitely.  Instead, I'd 
> > > > suggest increasing the scrub intervals by some factor (e.g., 2x).
> > > > 
> > > > The first option would probably be a change in the scrub scheduling in 
> > > > the OSD.
> > > > 
> > > 
> > > I would go for the first one. Imagine a large cluster where one backfill is busy, that would otherwise halt all scrubs while only a few OSDs are involved.
> > > 
> > > Option one isn't that hard to implement either I think.
> > 
> > I added a card to trello: https://trello.com/b/ugTc2QFH/ceph-backlog
> > 
> 
> Wouldn't this be enough?
> 
> https://github.com/ceph/ceph/pull/11874

Merged! See: https://github.com/ceph/ceph/commit/9100d3362a9ffdd26afc7ee1962c013931cc9e58

By setting osd_scrub_during_recovery to false (defaults to true) you can disable scrubs while recovery threads are active.

This means no NEW scrub will be iniated by the OSD while there are recovery ops active. Any running scrubs will keep running.

Wido 

> 
> Wido
> 
> > Thanks!
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html