On Mon, Nov 27, 2023 at 06:38:42AM +0000, Al Viro wrote: > On Sun, Nov 26, 2023 at 06:41:41PM +0000, Al Viro wrote: > > > d_invalidate() situation is more subtle - we need to sort out its interplay > > with d_splice_alias(). > > > > More concise variant of the scenario in question: > > * we have /mnt/foo/bar and a lot of its descendents in dcache on client > > * server does a rename, after which what used to be /mnt/foo/bar is /mnt/foo/baz > > * somebody on the client does a lookup of /mnt/foo/bar and gets told by > > the server that there's no directory with that name anymore. > > * that somebody hits d_invalidate(), unhashes /mnt/foo/bar and starts > > evicting its descendents > > * We try to mount something on /mnt/foo/baz/blah. We look up baz, get > > an fhandle and notice that there's a directory inode for it (/mnt/foo/bar). > > d_splice_alias() picks the bugger and moves it to /mnt/foo/baz, rehashing > > it in process, as it ought to. Then we find /mnt/foo/baz/blah in dcache and > > mount on top of it. > > * d_invalidate() finishes shrink_dcache_parent() and starts hunting for > > submounts to dissolve. And finds the mount we'd done. Which mount quietly > > disappears. > > > > Note that from the server POV the thing had been moved quite a while ago. > > No server-side races involved - all it seeem is a couple of LOOKUP in the > > same directory, one for the old name, one for the new. > > > > On the client on the mounter side we have an uneventful mount on /mnt/foo/baz, > > which had been there on server at the time we started and which remains in > > place after mount we'd created suddenly disappears. > > > > For the thread that ended up calling d_invalidate(), they'd been doing e.g. > > stat on a pathname that used to be there a while ago, but currently isn't. > > They get -ENOENT and no indication that something odd might have happened. > > > > >From ->d_revalidate() point of view there's also nothing odd happening - > > dentry is not a mountpoint, it stays in place until we return and there's > > no directory entry with that name on in its parent. It's as clear-cut > > as it gets - dentry is stale. > > > > The only overlap happening there is d_splice_alias() hitting in the middle > > of already started d_invalidate(). > > > > For a while I thought that ff17fa561a04 "d_invalidate(): unhash immediately" > > and 3a8e3611e0ba "d_walk(): kill 'finish' callback" might have something > > to do with it, but the same problem existed prior to that. > > > > FWIW, I suspect that the right answer would be along the lines of > > * if d_splice_alias() does move an exsiting (attached) alias in > > place, it ought to dissolve all mountpoints in subtree being moved. > > There might be subtleties, but in case when that __d_unalias() happens > > due to rename on server this is definitely the right thing to do. > > * d_invalidate() should *NOT* do anything with dentry that > > got moved (including moved by d_splice_alias()) from the place we'd > > found it in dcache. At least d_invalidate() done due to having > > ->d_revalidate() return 0. > > * d_invalidate() should dissolve all mountpoints in the > > subtree that existed when it got started (and found the victim > > still unmoved, that is). It should (as it does) prevent any > > new mountpoints added in that subtree, unless the mountpoint > > to be had been moved (spliced) out. What it really shouldn't > > do is touch the mountpoints that are currently outside of it > > due to moves. > > > > I'm going to look around and see if we have any weird cases where > > d_splice_alias() is used for things like "correct the case of > > dentry name on a case-mangled filesystem" - that would presumably > > not want to dissolve any submounts. I seem to recall seeing > > some shite of that sort, but that was a long time ago. > > > > Eric, Miklos - it might be a good idea if you at least took a > > look at whatever comes out of that (sub)thread; I'm trying to > > reconstruct the picture, but the last round of serious reworking > > of that area had been almost 10 years ago and your recollections > > of the considerations back then might help. I realize that they > > are probably rather fragmentary (mine definitely are) and any > > analysis will need to be redone on the current tree, but... > > TBH, I wonder if we ought to have d_invalidate() variant that would > unhash the dentry in question, do a variant of shrink_dcache_parent() > that would report if there had been any mountpoints and if there > had been any, do namespace_lock() and go hunting for mounts in that > subtree, moving corresponding struct mountpoint to a private list > as we go (removing them from mountpoint hash chains, that it). Then > have them all evicted after we'd finished walking the subtree... That sounds reasonable. > > The tricky part will be lock ordering - right now we have the > mountpoint hash protected by mount_lock (same as mount hash, probably > worth splitting anyway) and that nests outside of ->d_lock. > > Note that we don't do mountpoint hash lookups on mountpoint crossing > - it's nowhere near the really hot paths. What we have is > lookup_mountpoint() - plain hash lookup. Always > under namespace_lock() and mount_lock. > get_mountpoint() - there's an insertion into hash chain, > with dentry passed through the d_set_mounted(), which would > fail if we have d_invalidate() on the subtree. > Also always under namespace_lock() and mount_lock. > __put_mountpoint() - removal from the hash chain. > We remove from hash chain after having cleared DCACHE_MOUNTED. > _That_ can happen under mount_lock alone (taking out the stuck > submounts on final mntput()). > > So convert the mountpoint hash chains to hlist_bl, bitlocks nesting under > ->d_lock. Introduce a new dentry flag (DCHACE_MOUNT_INVALIDATION?) > In d_walk() callback we would > * do nothing if DCACHE_MOUNT is not set or DCACHE_MOUNT_INVALIDATION > is. > * otherwise set DCACHE_MOUNT_INVALIDATION, grab the bitlock on the > mountpoint hash chain matching that dentry, find struct mountpoint in it, > remove it from the chain and insert into a separate "collector" chain, all > without messing with refcount. Ok. > In lookup_mountpoint() and get_mountpoint() take the bitlock on chain. > In __put_mountpoint(), once it has grabbed ->d_lock > * check if it has DCACHE_MOUNT_INVALIDATION, use that to > decide which chain we are locking - the normal one or the collector > * clear both DCACHE_MOUNT and DCACHE_MOUNT_INVALIDATION > * remove from chain > * unlock the chain > * drop ->d_lock. > > Once we are finished walking the tree, go over the collector list > and do what __detach_mount() guts do. We are no longer under > any ->d_lock, so locking is not a problem. namespace_unlock() will > flush them all, same as it does for __detach_mount(). Ok.