Re: [PATCH 4/5] fs: honor LOOKUP_NONBLOCK for the last part of file open

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sat, 12 Dec 2020 14:03:02 -0800

On Sat, Dec 12, 2020 at 1:25 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> Do we ever do long term IO _while_ holding the direcoty inode lock? If
> we don't, then we can probably just ignore that side alltogether.

The inode lock is all kinds of messy. Part of it is that we have these
helper functions for taking it ("inode_lock_shared()" and friends).
Part of it is that some codepaths do *not* use those helpers and use
"inode->i_rwsem" directly. And part of it is that our comments
sometimes talk about the old name ("i_mutex").

The inode lock *can* be a problem. The historical problem point is
actually readdir(), which takes the lock for reading, but does so over
not just IO but also the user space accesses.

That used to be a huge problem when it was a mutex, not an rwlock. But
I think it can still be a problem for (a) filesystems that haven't
been converted to 'iterate_shared' or (b) if a slow readdir has the
lock, and a O_CREAT comes in, then new readers will block too.

Honestly, the inode lock is nasty and broken. It's made worse by the
fact that it really doesn't have great semantics: filesystems use it
randomly for internal "lock this inode" too.

A lot of inode lock users don't actually do any IO at all. The
messiness of that lock comes literally from the fact that it was this
random per-inode lock that just grew a lot of random uses. Many of
them aren't particularly relevant for directories, though.

It's one of my least favorite locks in the kernel, but practically
speaking it seldom causes problems.

But if you haven't figured out the pattern by now, let's just say that
"it's completely random".

It would be interesting to see if it causes actual problems. Because
maybe that could push us towards fixing some of them.

               Linus