Re: [PATCH] eventfs: Have inodes have unique inode numbers

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 28 Jan 2024 14:25:46 -0800

On Sun, 28 Jan 2024 at 14:17, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> The original code just used the mutex, but then we were hitting
> deadlocks because we used the mutex in the iput() logic. But this could
> have been due to the readdir logic causing the deadlocks.

I agree that it's likely that the readdir logic caused the deadlocks
on the eventfs_mutex, but at the same time it really does tend to be a
good idea to drop any locks when dealing with readdir().

The issue with the readdir iterator is that it needs to access user
space memory for every dirent it fills in, and any time you hold a
lock across a user space access, you are now opening yourself up to
having the lock have really long hold times. It's basically a great
way to cauise a mini-DoS.

And even if you now can do without the mutex in the release paths etc
by just using refcounts, and even if you thus get rid of the deadlock
itself, it's typically a very good idea to have the 'iterate_shared()'
function drop all locks before writing to user space.

The same is obviously true of 'read()' etc that also writes to user
space, but you tend to see the issue much more with the readdir code,
because it iterates over all these small things, and readdir()
typically wants the lock more too (because there's all that directory
metadata).

So dropping the lock might not be something you *have* to do in
iterate_shared, but it's often a good idea.

But dropping the lock also doesn't tend to be a big problem, if you
just have refcounted data structures. Sure, the data might change
(because you dropped the lock), but at least the data structure itself
is still there when you get the lock, so at most it might be a "I will
need to re-check that the entry hasn't been removed" kind of thing.

                 Linus