On Sat, 11 May 2024 at 09:13, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > So better batching, or maybe just walking the negative child dentry > list after having marked the parent dead and then released the lock, > might also be the solution. IOW, maybe the solution is something as simple as this UNTESTED patch instead? --- a/fs/namei.c +++ b/fs/namei.c @@ -4207,16 +4207,19 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, if (error) goto out; - shrink_dcache_parent(dentry); dentry->d_inode->i_flags |= S_DEAD; dont_mount(dentry); detach_mounts(dentry); + inode_unlock(dentry->d_inode); + + shrink_dcache_parent(dentry); + dput(dentry); + d_delete_notify(dir, dentry); + return 0; out: inode_unlock(dentry->d_inode); dput(dentry); - if (!error) - d_delete_notify(dir, dentry); return error; } EXPORT_SYMBOL(vfs_rmdir); where that "shrink_dcache_parent()" will still be quite expensive if the directory has a ton of negative dentries, but because we now free them after we've marked the dentry dead and released the inode, nobody much cares any more? Note (as usual) that this is untested, and maybe I haven't thought everything through and I might be missing some important detail. But I think shrinking the children later should be just fine - there's nothing people can *do* with them. At worst they are reachable for some lookup that doesn't take the locks, but that will just result in a negative dentry which is all good, since they cannot become positive dentries any more at this point. So I think the inode lock is irrelevant for negative dentries, and shrinking them outside the lock feels like a very natural thing to do. Again: this is more of a "brainstorming patch" than an actual suggestion. Yafang - it might be worth testing for your load, but please do so knowing that it might have some consistency issues, so don't test it on any production machinery, please ;) Can anybody point out why I'm being silly, and the above change is completely broken garbage? Please use small words to point out my mental deficiencies to make sure I get it. Just to clarify: this obviously will *not* speed up the actual rmdir itself. *ALL* it does is to avoid holding the lock over the potentially long cleanup operation. So the latency of the rmdir is still the same, but now it should no longer matter for anything else. Linus