Re: [GIT PULL] bcachefs changes for 6.12-rc1

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 24 Sep 2024 09:57:13 -0700

On Mon, 23 Sept 2024 at 20:55, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> That's effectively what the patch did - it added a spinlock per hash
> list.

Yeah, no, I like your patches, they all seem to be doing sane things
and improve the code.

The part I don't like is how the basic model that your patches improve
seems quite a bit broken.

For example, that whole superblock list lock contention just makes me
go "Christ, that's stupid". Not because your patches to fix it with
Waiman's fancy list would be wrong or bad, but because the whole pain
is self-inflicted garbage.

And it's all historically very very understandable. It wasn't broken
when it was written.

That singly linked list is 100% sane in the historical context of "we
really don't want anything fancy for this". The whole list of inodes
for a superblock is basically unimportant, and it's main use (only
use?) is for the final "check that we don't have busy inodes at umount
time, remove any pending stuff".

So using a stupid linked list was absolutely the right thing to do,
but now that just the *locking* for that list is a pain, it turns out
that we probably shouldn't use a list at all. Not because the list was
wrong, but because a flat list is such a pain for locking, since
there's no hierarchy to spread the locking out over.

(We used to have that kind of "flat lock" for the dcache too, but
"dcache_lock" went away many moons ago, and good riddance - but the
only reason it could go away is that the dcache has a hierarchy, so
that you can always lock just the local dentry (or the parent dentry)
if you are just careful).

> [ filesystems doing their own optimized thing ]
>
> IOWs, it's not clear to me that this is a caching model we really
> want to persue in general because of a) the code duplication and b)
> the issues such an inode life-cycle model has interacting with the
> VFS life-cycle expectations...

No, I'm sure you are right, and I'm just frustrated with this code
that probably _should_ look a lot more like the dcache code, but
doesn't.

I get the very strong feeling that we should have a per-superblock
hash table that we could also traverse the entries of. That way the
superblock inode list would get some structure (the hash buckets) that
would allow the locking to be distributed (and we'd only need one lock
and just share it between the "hash inode" and "add it to the
superblock list").

But that would require something much more involved than "improve the
current code".

               Linus