Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 27 Feb 2024 14:46:11 -0800

On Tue, 27 Feb 2024 at 14:21, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote:
>
> ext4 code doesn't do that. it takes the inode lock in exclusive mode,
> just like everyone else.

Not for dio, it doesn't.

> > The real question is how much of userspace will that break, because
> > of implicit assumptions that the kernel has always serialised
> > buffered writes?
>
> What would break?

Well, at least in theory you could have concurrent overlapping writes
of folio crossing records, and currently you do get the guarantee that
one or the other record is written, but relying just on page locking
would mean that you might get a mix of them at page boundaries.

I'm not sure that such a model would make any sense, but if you
*intend* to break if somebody doesn't do write-to-write exclusion,
that's certainly possible.

The fact that we haven't given the atomicity guarantees wrt reads does
imply that nobody can do this kinds of crazy thing, but it's an
implication, not a guarantee.

I really don't think such an odd load is sensible (except for the
special case of O_APPEND records, which definitely is sensible), and
it is certainly solvable.

For example, a purely "local lock" model would be to just lock all
pages in order as you write them, and not unlock the previous page
until you've locked the next one.

That is a really simple model that doesn't require any range locking
or anything like that because it simply relies on all writes always
being done strictly in file position order.

But you'd have to be very careful with deadlocks anyway in case there
are other cases of multi-page locks. And even without deadlocks, you
might end up having just a lot more lock contention (nested locks can
have *much* worse contention than sequential ones)

There are other models with multi-level locking, but I think we'd like
to try to keep things simple if we change something core like this.

               Linus