Re: [RFC PATCH] fs: dcache: Delete the associated dentry when deleting a file

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sat, 11 May 2024 11:05:40 -0700

On Sat, 11 May 2024 at 09:13, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> So better batching, or maybe just walking the negative child dentry
> list after having marked the parent dead and then released the lock,
> might also be the solution.

IOW, maybe the solution is something as simple as this UNTESTED patch instead?

  --- a/fs/namei.c
  +++ b/fs/namei.c
  @@ -4207,16 +4207,19 @@ int vfs_rmdir(struct mnt_idmap *idmap,
struct inode *dir,
        if (error)
                goto out;

  -     shrink_dcache_parent(dentry);
        dentry->d_inode->i_flags |= S_DEAD;
        dont_mount(dentry);
        detach_mounts(dentry);
  +     inode_unlock(dentry->d_inode);
  +
  +     shrink_dcache_parent(dentry);
  +     dput(dentry);
  +     d_delete_notify(dir, dentry);
  +     return 0;

   out:
        inode_unlock(dentry->d_inode);
        dput(dentry);
  -     if (!error)
  -             d_delete_notify(dir, dentry);
        return error;
   }
   EXPORT_SYMBOL(vfs_rmdir);

where that "shrink_dcache_parent()" will still be quite expensive if
the directory has a ton of negative dentries, but because we now free
them after we've marked the dentry dead and released the inode, nobody
much cares any more?

Note (as usual) that this is untested, and maybe I haven't thought
everything through and I might be missing some important detail.

But I think shrinking the children later should be just fine - there's
nothing people can *do* with them. At worst they are reachable for
some lookup that doesn't take the locks, but that will just result in
a negative dentry which is all good, since they cannot become positive
dentries any more at this point.

So I think the inode lock is irrelevant for negative dentries, and
shrinking them outside the lock feels like a very natural thing to do.

Again: this is more of a "brainstorming patch" than an actual
suggestion. Yafang - it might be worth testing for your load, but
please do so knowing that it might have some consistency issues, so
don't test it on any production machinery, please ;)

Can anybody point out why I'm being silly, and the above change is
completely broken garbage? Please use small words to point out my
mental deficiencies to make sure I get it.

Just to clarify: this obviously will *not* speed up the actual rmdir
itself. *ALL* it does is to avoid holding the lock over the
potentially long cleanup operation. So the latency of the rmdir is
still the same, but now it should no longer matter for anything else.

            Linus