On Sun, May 12, 2024 at 4:03 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Yafang Shao reports that he has seen loads that generate billions of > negative dentries in a directory, which then when the directory is > removed causes excessive latencies for other users because the dentry > shrinking is done under the directory inode lock. > > There seems to be no actual reason for holding the inode lock any more > by the time we get rid of the now uninteresting negative dentries, and > it's an effect of the calling convention. > > Split the 'vfs_rmdir()' function into two separate phases: > > - 'vfs_rmdir_raw()' does the actual main rmdir heavy lifting > > - 'vfs_rmdir_cleanup()' needs to be run by the caller after a > successful raw call, after the caller has dropped the inode locks. > > We leave the 'vfs_rmdir()' function around, since it has multiple > callers, and only convert the main system call path to the new two-phase > model. The other uses will be left as an exercise for the reader for > when people find they care. > > [ Side note: I think the 'dget()/dput()' pair in vfs_rmdir_raw() is > superfluous, since callers always have to have a dentry reference over > the call anyway. That's a separate issue. - Linus ] > > Reported-by: Yafang Shao <laoar.shao@xxxxxxxxx> > Link: https://lore.kernel.org/all/20240511022729.35144-1-laoar.shao@xxxxxxxxx/ > Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > Cc: Christian Brauner <brauner@xxxxxxxxxx> > Cc: Jan Kara <jack@xxxxxxx> > Cc: Waiman Long <longman@xxxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> This could resolve the secondary concern. Tested-by: Yafang Shao <laoar.shao@xxxxxxxxx> Might it be feasible to execute the vfs_rmdir_cleanup() within a kwoker? Such an approach could potentially mitigate the initial concern as well. > --- > > Second version - this time doing the dentry pruning even later, after > releasing the parent inode lock too. > > I did the same amount of "extensive testing" on this one as the previous > one. IOW, little-to-none. > > fs/namei.c | 61 ++++++++++++++++++++++++++++++++++++------------------ > 1 file changed, 41 insertions(+), 20 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 28e62238346e..15b4ff6ed1e5 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -4176,21 +4176,7 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode) > return do_mkdirat(AT_FDCWD, getname(pathname), mode); > } > > -/** > - * vfs_rmdir - remove directory > - * @idmap: idmap of the mount the inode was found from > - * @dir: inode of @dentry > - * @dentry: pointer to dentry of the base directory > - * > - * Remove a directory. > - * > - * If the inode has been found through an idmapped mount the idmap of > - * the vfsmount must be passed through @idmap. This function will then take > - * care to map the inode according to @idmap before checking permissions. > - * On non-idmapped mounts or if permission checking is to be performed on the > - * raw inode simply pass @nop_mnt_idmap. > - */ > -int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, > +static int vfs_rmdir_raw(struct mnt_idmap *idmap, struct inode *dir, > struct dentry *dentry) > { > int error = may_delete(idmap, dir, dentry, 1); > @@ -4217,18 +4203,43 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, > if (error) > goto out; > > - shrink_dcache_parent(dentry); > dentry->d_inode->i_flags |= S_DEAD; > dont_mount(dentry); > detach_mounts(dentry); > - > out: > inode_unlock(dentry->d_inode); > dput(dentry); > - if (!error) > - d_delete_notify(dir, dentry); > return error; > } > + > +static inline void vfs_rmdir_cleanup(struct inode *dir, struct dentry *dentry) > +{ > + shrink_dcache_parent(dentry); > + d_delete_notify(dir, dentry); > +} > + > +/** > + * vfs_rmdir - remove directory > + * @idmap: idmap of the mount the inode was found from > + * @dir: inode of @dentry > + * @dentry: pointer to dentry of the base directory > + * > + * Remove a directory. > + * > + * If the inode has been found through an idmapped mount the idmap of > + * the vfsmount must be passed through @idmap. This function will then take > + * care to map the inode according to @idmap before checking permissions. > + * On non-idmapped mounts or if permission checking is to be performed on the > + * raw inode simply pass @nop_mnt_idmap. > + */ > +int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, > + struct dentry *dentry) > +{ > + int retval = vfs_rmdir_raw(idmap, dir, dentry); > + if (!retval) > + vfs_rmdir_cleanup(dir, dentry); > + return retval; > +} > EXPORT_SYMBOL(vfs_rmdir); > > int do_rmdir(int dfd, struct filename *name) > @@ -4272,7 +4283,17 @@ int do_rmdir(int dfd, struct filename *name) > error = security_path_rmdir(&path, dentry); > if (error) > goto exit4; > - error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry); > + error = vfs_rmdir_raw(mnt_idmap(path.mnt), path.dentry->d_inode, dentry); > + if (error) > + goto exit4; > + inode_unlock(path.dentry->d_inode); > + mnt_drop_write(path.mnt); > + vfs_rmdir_cleanup(path.dentry->d_inode, dentry); > + dput(dentry); > + path_put(&path); > + putname(name); > + return 0; > + > exit4: > dput(dentry); > exit3: > -- > 2.44.0.330.g4d18c88175 > -- Regards Yafang