[folks involved into d_invalidate()/submount eviction stuff Cc'd] On Sun, Nov 26, 2023 at 04:52:19AM +0000, Al Viro wrote: > PS: as the matter of fact, it might be a good idea to pass the parent > as explicit argument to ->d_revalidate(), now that we are passing the > name as well. Look at the boilerplate in the instances; all that > parent = READ_ONCE(dentry->d_parent); > dir = d_inode_rcu(parent); > if (!dir) > return -ECHILD; > ... > on the RCU side combined with > parent = dget_parent(dentry); > dir = d_inode(parent); > ... > dput(dir); > stuff. > > It's needed only because the caller had not told us which directory > is that thing supposed to be in; in non-RCU mode the parent is > explicitly pinned down, no need to play those games. All we need > is > dir = d_inode_rcu(parent); > if (!dir) // could happen only in RCU mode > return -ECHILD; > assuming we need the parent inode, that is. > > So... how about > int (*d_revalidate)(struct dentry *dentry, struct dentry *parent, > const struct qstr *name, unsigned int flags); > since we are touching all instances anyway? OK, it's definitely a good idea for simplifying ->d_revalidate() instances and I think we should go for it on thes grounds alone. I'll do that. d_invalidate() situation is more subtle - we need to sort out its interplay with d_splice_alias(). More concise variant of the scenario in question: * we have /mnt/foo/bar and a lot of its descendents in dcache on client * server does a rename, after which what used to be /mnt/foo/bar is /mnt/foo/baz * somebody on the client does a lookup of /mnt/foo/bar and gets told by the server that there's no directory with that name anymore. * that somebody hits d_invalidate(), unhashes /mnt/foo/bar and starts evicting its descendents * We try to mount something on /mnt/foo/baz/blah. We look up baz, get an fhandle and notice that there's a directory inode for it (/mnt/foo/bar). d_splice_alias() picks the bugger and moves it to /mnt/foo/baz, rehashing it in process, as it ought to. Then we find /mnt/foo/baz/blah in dcache and mount on top of it. * d_invalidate() finishes shrink_dcache_parent() and starts hunting for submounts to dissolve. And finds the mount we'd done. Which mount quietly disappears. Note that from the server POV the thing had been moved quite a while ago. No server-side races involved - all it seeem is a couple of LOOKUP in the same directory, one for the old name, one for the new. On the client on the mounter side we have an uneventful mount on /mnt/foo/baz, which had been there on server at the time we started and which remains in place after mount we'd created suddenly disappears. For the thread that ended up calling d_invalidate(), they'd been doing e.g. stat on a pathname that used to be there a while ago, but currently isn't. They get -ENOENT and no indication that something odd might have happened. >From ->d_revalidate() point of view there's also nothing odd happening - dentry is not a mountpoint, it stays in place until we return and there's no directory entry with that name on in its parent. It's as clear-cut as it gets - dentry is stale. The only overlap happening there is d_splice_alias() hitting in the middle of already started d_invalidate(). For a while I thought that ff17fa561a04 "d_invalidate(): unhash immediately" and 3a8e3611e0ba "d_walk(): kill 'finish' callback" might have something to do with it, but the same problem existed prior to that. FWIW, I suspect that the right answer would be along the lines of * if d_splice_alias() does move an exsiting (attached) alias in place, it ought to dissolve all mountpoints in subtree being moved. There might be subtleties, but in case when that __d_unalias() happens due to rename on server this is definitely the right thing to do. * d_invalidate() should *NOT* do anything with dentry that got moved (including moved by d_splice_alias()) from the place we'd found it in dcache. At least d_invalidate() done due to having ->d_revalidate() return 0. * d_invalidate() should dissolve all mountpoints in the subtree that existed when it got started (and found the victim still unmoved, that is). It should (as it does) prevent any new mountpoints added in that subtree, unless the mountpoint to be had been moved (spliced) out. What it really shouldn't do is touch the mountpoints that are currently outside of it due to moves. I'm going to look around and see if we have any weird cases where d_splice_alias() is used for things like "correct the case of dentry name on a case-mangled filesystem" - that would presumably not want to dissolve any submounts. I seem to recall seeing some shite of that sort, but that was a long time ago. Eric, Miklos - it might be a good idea if you at least took a look at whatever comes out of that (sub)thread; I'm trying to reconstruct the picture, but the last round of serious reworking of that area had been almost 10 years ago and your recollections of the considerations back then might help. I realize that they are probably rather fragmentary (mine definitely are) and any analysis will need to be redone on the current tree, but...