Re: A design for CephFS forward scrub with multiple MDS

John Spray <jspray@xxxxxxxxxx> · Wed, 21 Sep 2016 14:45:01 +0100

On Wed, Sep 21, 2016 at 2:25 PM, Douglas Fuller <dfuller@xxxxxxxxxx> wrote:
>
>>> Outbound scrub requests will need to be tracked and restarted in the case of MDS failure.
>>
>> One thing we didn't discuss was the backwards case, where I (an MDS)
>> am told by another MDS to scrub a subtree, but he fails before I can
>> tell him the result of my scrub.  Simplest thing seems to be to abort
>> scrubs in this case, and say that (for the moment) a scrub is only
>> guaranteed to complete if the MDS where it was initiated stays online?
>
> That makes sense as a first pass. For the future, we could resend the completion to the new subtree root owner after reconnecting and at least update the rstats. The scrub may even complete in that case.

Yes, although for the completely general case (including all MDSs
offline simultaneously) we would need to start persisting something.
Not sure if we'd ever want to do that silently inside the MDSs though,
as spending IOPs on scrub is probably not desired behaviour fresh from
a power cycle.  #futurework

John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html