Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 25 Feb 2024 15:45:47 -0800

On Sun, 25 Feb 2024 at 13:14, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> Not artificial; this was a real customer with a real workload.  I don't
> know how much about it I can discuss publically, but my memory of it was a
> system writing a log with 64 byte entries, millions of entries per second.
> Occasionally the system would have to go back and look at an entry in the
> last few seconds worth of data (so it would still be in the page cache).

Honestly, that should never hit any kind of contention on the page cache.

Unless they did something else odd, that load should be entirely
serialized by the POSIX "atomic write" requirements and the
"inode_lock(inode)"  that writes take.

So it would end up literally being just one cache miss - and if you do
things across CPU's and have cachelines moving around, that inode lock
would be the bigger offender in that it is the one that would see any
contention.

Now, *that* is locking that I despise, much more than the page cache
lock.  It serializes unrelated writes to different areas, and the
direct-IO people instead said "we don't care about POSIX" and did
concurrent writes without it.

That said, I do wonder if we could take advantage of the fact that we
have the inode lock, and just make page eviction take that lock too
(possibly in shared form).

At that point, you really could just say "no need to increment the
reference count, because we can do writes knowing that the mapping
pages are stable".

Not pretty, but we could possibly at least take advantage of the
horrid other ugliness of the inode locking and POSIX rules that nobody
really wants.

              Linus