Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Sun, 25 Feb 2024 at 17:58, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote:
> According to my reading just now, ext4 and btrfs (as well as bcachefs)
> also don't take the inode lock in the read path - xfs is the only one
> that does.

Yeah, I should have remembered that detail - DaveC has pointed it out
at some point how other filesystems don't actually honor the whole
"all or nothing visible to read".

And I was actually wrong about the common cases like ext2 - they use
generic_file_write_iter(), which does take that inode lock, and I was
confused with generic_perform_write() (which does not).

It was always the read side that didn't care, as you point out. It's
been some time since I looked at that.

But as mentioned, nobody has actually ever shown any real interest in
caring about the lack of POSIX technicality.

> I think write vs. write consistency is the more interesting case; the
> question there is does falling back to the inode lock when we can't lock
> all the folios simultaneously work.

I really don't think the write-write consistency is all that
interesting either, and it really does hurt.

If you're some toy database that would love to use buffered writes on
just a DB file, that "no concurrent writes" can hurt a lot. So then
people say "use DIO", but that has its own issues...

There is one obvious special case, and I think it's the primary one
why we end up having that inode_lock: O_APPEND or any other write
extending the size of the file.

THAT one obviously has to work right, and that's the case when
multiple writers actually do want to get write-write consistency, and
where it makes total sense to serialize them all.

That's the one case that even DIO cares about.

In the other cases, it's hard to argue that "one or the other wins the
whole range" is seriously hugely better than "one or the other wins at
some granularity". What the hell are you doing overlapping write
ranges for if you have a "one or the other" mentality?

Of course, maybe in practice it would be fine to do the "lock all the
folios, with the fallback being the inode lock" - and we could even
start with "all" being a pretty small number (perhaps starting with
"one" ;^).


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux