On Wed, Sep 21, 2016 at 3:58 AM, Douglas Fuller <dfuller@xxxxxxxxxx> wrote: > >> On Sep 21, 2016, at 2:29 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> >> On Tue, Sep 20, 2016 at 10:16 AM, Douglas Fuller <dfuller@xxxxxxxxxx> wrote: >>> >>> When popping an inode from the scrub stack, it’s important to note that its authority may have been changed by some intervening export. The scrubbing MDS will drop any file inode for which it is no longer authoritative, assuming this would be handled by the correct MDS. For directory inodes, forward a request to the authoritative MDS to scrub the directory. This may result in attempts to scrub the same inodes more than once (though we track this and can avoid most of the work), it seems necessary in order to guarantee no directories are missed due to splits or exports (NB: this is correct, right?). >> >> I think we need to spell this out a little more. Some thoughts: >> * right now, the ScrubStack is just a CInode*. This needs to turn into >> a two-way reference. > > I wasn’t at the datatype level of detail here. I agree it can’t be a CInode* anymore, and figured it’d have to be something we could fetch if it is exported while on the stack. > >> * When we freeze a tree for export, we need a new step that removes it >> from the ScrubStack and sets up the "remote scrub" state we'd have if >> it were a freshly-encountered subtree boundary >> * this may involve some delayed execution of remote scrub requests, >> or of bundling up the need for a scrub in the exported state > > Directories don’t know where their subtree roots are, so I’m not sure how we would remove subdirectories and their contained files from the stack if one of their parents were exported. I think the stack could be “dumb” in some sense and not care what happens to the items on it. If we pop a file inode for which we are not authoritative, we drop it on the floor, assuming its parent directory will cause it to be scrubbed elsewhere. If we pop a directory inode for which we are not authoritative, we send a request to the authoritative MDS to scrub it. Well, we can't keep these inodes around the way the code currently works: unless I'm much mistaken, nothing is keeping them updated so they're just out-of-date copies of metadata. PIN_SCRUBQUEUE exists to keep inodes in-memory once on the scrub queue but it was explicitly never designed to interact with multi-mds systems and needs to be cleaned up; the current behavior is just broken. Luckily, there are some not-too-ridiculous solutions. * We already freeze a subtree before exporting it. I don't remember if that involves actually touching every in-memory CDentry/CInode underneath, or marking a flag on the root? * We obviously walk our way through the whole subtree when bundling it up for export So, in one of those passes (or in a new one tacked on to freezing), we can detect that an inode is on the scrub queue and remove it. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html