Re: new scrub and repair discussion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 7 Jun 2016, kefu chai wrote:
> Dan, your comments are more like feature requests related to current
> scrub, instead to the
> new scrub/repair feature design. reply inlined.
> 
> On Fri, May 27, 2016 at 8:03 PM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > Hi all,
> >
> > I have some high-level feedback for scrub/repair. Apologies if some of
> > these are already taken into account.
> >
> > 1. ceph pg cancel-scrub <pgid>: For a variety of reasons it would be
> > useful to be able to cancel an ongoing (deep-)scrub on a PG. The
> > no(deep-)scrub flags work more like a pause, but today if I want to
> > stop a scrub it requires an OSD to be restarted.
> 
> it's a feature request. we did have this feature in rados API before, but it was
> not exposed by the rados cli, and hence removed. if you'd like to get it back,
> maybe you could file an issue over tracker?
> 
> > 2. ceph pg scrub/deep-scrub/repair often do not start because the
> > master OSD cannot get a reservation on all the replica/EC-part OSDs
> > (due to osd max scrubs). It is possible using some strange gymnastics
> > to force PG to start repairing/scrubbing immediately, but those are
> > not intuitive. IMHO, ceph pg scrub/deep-scrub/repair <pgid> should
> > start immediately regardless of the 'osd max scrubs' value.
> 
> i think it's more a design decision.

My concern with this one is that lots of people have written their own 
scrub scheduling scripts (e.g., because of scheduling problems in the 
past).  I'd favor adding a --force-now option or separate command for an 
immediate scrub.

> > 5. Do we even need the shallow scrub functionality? I'm very curious
> > how many problems that shallow scrubbing finds IRL compared with
> > deep-scrubbing. Does ceph track these stats independently?
> 
> i don't have any numbers to support your theory or against it. but i think
> having a light-weight scrub is necessary.

The lightweight scrub mostly catches replication/recovery bugs.  It's 
useful enough just as a testing/development tool.  I'm not sure that it is 
as useful for users, though...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux