Yafang Shao reports that he has seen loads that generate billions of negative dentries in a directory, which then when the directory is removed causes excessive latencies for other users because the dentry shrinking is done under the directory inode lock. There seems to be no actual reason for holding the inode lock any more by the time we get rid of the now uninteresting negative dentries, and it's an effect of the calling convention. Split the 'vfs_rmdir()' function into two separate phases: - 'vfs_rmdir_raw()' does the actual main rmdir heavy lifting - 'vfs_rmdir_cleanup()' needs to be run by the caller after a successful raw call, after the caller has dropped the inode locks. We leave the 'vfs_rmdir()' function around, since it has multiple callers, and only convert the main system call path to the new two-phase model. The other uses will be left as an exercise for the reader for when people find they care. [ Side note: I think the 'dget()/dput()' pair in vfs_rmdir_raw() is superfluous, since callers always have to have a dentry reference over the call anyway. That's a separate issue. - Linus ] Reported-by: Yafang Shao <laoar.shao@xxxxxxxxx> Link: https://lore.kernel.org/all/20240511022729.35144-1-laoar.shao@xxxxxxxxx/ Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Cc: Christian Brauner <brauner@xxxxxxxxxx> Cc: Jan Kara <jack@xxxxxxx> Cc: Waiman Long <longman@xxxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> --- Second version - this time doing the dentry pruning even later, after releasing the parent inode lock too. I did the same amount of "extensive testing" on this one as the previous one. IOW, little-to-none. fs/namei.c | 61 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 41 insertions(+), 20 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 28e62238346e..15b4ff6ed1e5 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4176,21 +4176,7 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode) return do_mkdirat(AT_FDCWD, getname(pathname), mode); } -/** - * vfs_rmdir - remove directory - * @idmap: idmap of the mount the inode was found from - * @dir: inode of @dentry - * @dentry: pointer to dentry of the base directory - * - * Remove a directory. - * - * If the inode has been found through an idmapped mount the idmap of - * the vfsmount must be passed through @idmap. This function will then take - * care to map the inode according to @idmap before checking permissions. - * On non-idmapped mounts or if permission checking is to be performed on the - * raw inode simply pass @nop_mnt_idmap. - */ -int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, +static int vfs_rmdir_raw(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry) { int error = may_delete(idmap, dir, dentry, 1); @@ -4217,18 +4203,43 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, if (error) goto out; - shrink_dcache_parent(dentry); dentry->d_inode->i_flags |= S_DEAD; dont_mount(dentry); detach_mounts(dentry); - out: inode_unlock(dentry->d_inode); dput(dentry); - if (!error) - d_delete_notify(dir, dentry); return error; } + +static inline void vfs_rmdir_cleanup(struct inode *dir, struct dentry *dentry) +{ + shrink_dcache_parent(dentry); + d_delete_notify(dir, dentry); +} + +/** + * vfs_rmdir - remove directory + * @idmap: idmap of the mount the inode was found from + * @dir: inode of @dentry + * @dentry: pointer to dentry of the base directory + * + * Remove a directory. + * + * If the inode has been found through an idmapped mount the idmap of + * the vfsmount must be passed through @idmap. This function will then take + * care to map the inode according to @idmap before checking permissions. + * On non-idmapped mounts or if permission checking is to be performed on the + * raw inode simply pass @nop_mnt_idmap. + */ +int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, + struct dentry *dentry) +{ + int retval = vfs_rmdir_raw(idmap, dir, dentry); + if (!retval) + vfs_rmdir_cleanup(dir, dentry); + return retval; +} EXPORT_SYMBOL(vfs_rmdir); int do_rmdir(int dfd, struct filename *name) @@ -4272,7 +4283,17 @@ int do_rmdir(int dfd, struct filename *name) error = security_path_rmdir(&path, dentry); if (error) goto exit4; - error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry); + error = vfs_rmdir_raw(mnt_idmap(path.mnt), path.dentry->d_inode, dentry); + if (error) + goto exit4; + inode_unlock(path.dentry->d_inode); + mnt_drop_write(path.mnt); + vfs_rmdir_cleanup(path.dentry->d_inode, dentry); + dput(dentry); + path_put(&path); + putname(name); + return 0; + exit4: dput(dentry); exit3: -- 2.44.0.330.g4d18c88175