Re: [PATCH v2] vfs: move dentry shrinking outside the inode lock in 'rmdir()'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, May 12, 2024 at 4:03 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Yafang Shao reports that he has seen loads that generate billions of
> negative dentries in a directory, which then when the directory is
> removed causes excessive latencies for other users because the dentry
> shrinking is done under the directory inode lock.
>
> There seems to be no actual reason for holding the inode lock any more
> by the time we get rid of the now uninteresting negative dentries, and
> it's an effect of the calling convention.
>
> Split the 'vfs_rmdir()' function into two separate phases:
>
>  - 'vfs_rmdir_raw()' does the actual main rmdir heavy lifting
>
>  - 'vfs_rmdir_cleanup()' needs to be run by the caller after a
>    successful raw call, after the caller has dropped the inode locks.
>
> We leave the 'vfs_rmdir()' function around, since it has multiple
> callers, and only convert the main system call path to the new two-phase
> model.  The other uses will be left as an exercise for the reader for
> when people find they care.
>
> [ Side note: I think the 'dget()/dput()' pair in vfs_rmdir_raw() is
>   superfluous, since callers always have to have a dentry reference over
>   the call anyway. That's a separate issue.    - Linus ]
>
> Reported-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> Link: https://lore.kernel.org/all/20240511022729.35144-1-laoar.shao@xxxxxxxxx/
> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Christian Brauner <brauner@xxxxxxxxxx>
> Cc: Jan Kara <jack@xxxxxxx>
> Cc: Waiman Long <longman@xxxxxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

This could resolve the secondary concern.
Tested-by: Yafang Shao <laoar.shao@xxxxxxxxx>

Might it be feasible to execute the vfs_rmdir_cleanup() within a
kwoker? Such an approach could potentially mitigate the initial
concern as well.

> ---
>
> Second version - this time doing the dentry pruning even later, after
> releasing the parent inode lock too.
>
> I did the same amount of "extensive testing" on this one as the previous
> one.  IOW, little-to-none.
>
>  fs/namei.c | 61 ++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 41 insertions(+), 20 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 28e62238346e..15b4ff6ed1e5 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -4176,21 +4176,7 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode)
>         return do_mkdirat(AT_FDCWD, getname(pathname), mode);
>  }
>
> -/**
> - * vfs_rmdir - remove directory
> - * @idmap:     idmap of the mount the inode was found from
> - * @dir:       inode of @dentry
> - * @dentry:    pointer to dentry of the base directory
> - *
> - * Remove a directory.
> - *
> - * If the inode has been found through an idmapped mount the idmap of
> - * the vfsmount must be passed through @idmap. This function will then take
> - * care to map the inode according to @idmap before checking permissions.
> - * On non-idmapped mounts or if permission checking is to be performed on the
> - * raw inode simply pass @nop_mnt_idmap.
> - */
> -int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
> +static int vfs_rmdir_raw(struct mnt_idmap *idmap, struct inode *dir,
>                      struct dentry *dentry)
>  {
>         int error = may_delete(idmap, dir, dentry, 1);
> @@ -4217,18 +4203,43 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
>         if (error)
>                 goto out;
>
> -       shrink_dcache_parent(dentry);
>         dentry->d_inode->i_flags |= S_DEAD;
>         dont_mount(dentry);
>         detach_mounts(dentry);
> -
>  out:
>         inode_unlock(dentry->d_inode);
>         dput(dentry);
> -       if (!error)
> -               d_delete_notify(dir, dentry);
>         return error;
>  }
> +
> +static inline void vfs_rmdir_cleanup(struct inode *dir, struct dentry *dentry)
> +{
> +       shrink_dcache_parent(dentry);
> +       d_delete_notify(dir, dentry);
> +}
> +
> +/**
> + * vfs_rmdir - remove directory
> + * @idmap:     idmap of the mount the inode was found from
> + * @dir:       inode of @dentry
> + * @dentry:    pointer to dentry of the base directory
> + *
> + * Remove a directory.
> + *
> + * If the inode has been found through an idmapped mount the idmap of
> + * the vfsmount must be passed through @idmap. This function will then take
> + * care to map the inode according to @idmap before checking permissions.
> + * On non-idmapped mounts or if permission checking is to be performed on the
> + * raw inode simply pass @nop_mnt_idmap.
> + */
> +int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
> +                    struct dentry *dentry)
> +{
> +       int retval = vfs_rmdir_raw(idmap, dir, dentry);
> +       if (!retval)
> +               vfs_rmdir_cleanup(dir, dentry);
> +       return retval;
> +}
>  EXPORT_SYMBOL(vfs_rmdir);
>
>  int do_rmdir(int dfd, struct filename *name)
> @@ -4272,7 +4283,17 @@ int do_rmdir(int dfd, struct filename *name)
>         error = security_path_rmdir(&path, dentry);
>         if (error)
>                 goto exit4;
> -       error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
> +       error = vfs_rmdir_raw(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
> +       if (error)
> +               goto exit4;
> +       inode_unlock(path.dentry->d_inode);
> +       mnt_drop_write(path.mnt);
> +       vfs_rmdir_cleanup(path.dentry->d_inode, dentry);
> +       dput(dentry);
> +       path_put(&path);
> +       putname(name);
> +       return 0;
> +
>  exit4:
>         dput(dentry);
>  exit3:
> --
> 2.44.0.330.g4d18c88175
>


-- 
Regards
Yafang





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux