Re: A design for CephFS forward scrub with multiple MDS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 21, 2016 at 3:58 AM, Douglas Fuller <dfuller@xxxxxxxxxx> wrote:
>
>> On Sep 21, 2016, at 2:29 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>>
>> On Tue, Sep 20, 2016 at 10:16 AM, Douglas Fuller <dfuller@xxxxxxxxxx> wrote:
>>>
>>> When popping an inode from the scrub stack, it’s important to note that its authority may have been changed by some intervening export. The scrubbing MDS will drop any file inode for which it is no longer authoritative, assuming this would be handled by the correct MDS. For directory inodes, forward a request to the authoritative MDS to scrub the directory. This may result in attempts to scrub the same inodes more than once (though we track this and can avoid most of the work), it seems necessary in order to guarantee no directories are missed due to splits or exports (NB: this is correct, right?).
>>
>> I think we need to spell this out a little more. Some thoughts:
>> * right now, the ScrubStack is just a CInode*. This needs to turn into
>> a two-way reference.
>
> I wasn’t at the datatype level of detail here. I agree it can’t be a CInode* anymore, and figured it’d have to be something we could fetch if it is exported while on the stack.
>
>> * When we freeze a tree for export, we need a new step that removes it
>> from the ScrubStack and sets up the "remote scrub" state we'd have if
>> it were a freshly-encountered subtree boundary
>>  * this may involve some delayed execution of remote scrub requests,
>> or of bundling up the need for a scrub in the exported state
>
> Directories don’t know where their subtree roots are, so I’m not sure how we would remove subdirectories and their contained files from the stack if one of their parents were exported. I think the stack could be “dumb” in some sense and not care what happens to the items on it. If we pop a file inode for which we are not authoritative, we drop it on the floor, assuming its parent directory will cause it to be scrubbed elsewhere. If we pop a directory inode for which we are not authoritative, we send a request to the authoritative MDS to scrub it.

Well, we can't keep these inodes around the way the code currently
works: unless I'm much mistaken, nothing is keeping them updated so
they're just out-of-date copies of metadata. PIN_SCRUBQUEUE exists to
keep inodes in-memory once on the scrub queue but it was explicitly
never designed to interact with multi-mds systems and needs to be
cleaned up; the current behavior is just broken. Luckily, there are
some not-too-ridiculous solutions.
* We already freeze a subtree before exporting it. I don't remember if
that involves actually touching every in-memory CDentry/CInode
underneath, or marking a flag on the root?
* We obviously walk our way through the whole subtree when bundling it
up for export
So, in one of those passes (or in a new one tacked on to freezing), we
can detect that an inode is on the scrub queue and remove it.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux