On Sun, 28 Jan 2024 at 14:17, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > The original code just used the mutex, but then we were hitting > deadlocks because we used the mutex in the iput() logic. But this could > have been due to the readdir logic causing the deadlocks. I agree that it's likely that the readdir logic caused the deadlocks on the eventfs_mutex, but at the same time it really does tend to be a good idea to drop any locks when dealing with readdir(). The issue with the readdir iterator is that it needs to access user space memory for every dirent it fills in, and any time you hold a lock across a user space access, you are now opening yourself up to having the lock have really long hold times. It's basically a great way to cauise a mini-DoS. And even if you now can do without the mutex in the release paths etc by just using refcounts, and even if you thus get rid of the deadlock itself, it's typically a very good idea to have the 'iterate_shared()' function drop all locks before writing to user space. The same is obviously true of 'read()' etc that also writes to user space, but you tend to see the issue much more with the readdir code, because it iterates over all these small things, and readdir() typically wants the lock more too (because there's all that directory metadata). So dropping the lock might not be something you *have* to do in iterate_shared, but it's often a good idea. But dropping the lock also doesn't tend to be a big problem, if you just have refcounted data structures. Sure, the data might change (because you dropped the lock), but at least the data structure itself is still there when you get the lock, so at most it might be a "I will need to re-check that the entry hasn't been removed" kind of thing. Linus