Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Sun, 25 Feb 2024 20:02:15 -0500

On Sun, Feb 25, 2024 at 03:45:47PM -0800, Linus Torvalds wrote:
> On Sun, 25 Feb 2024 at 13:14, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > Not artificial; this was a real customer with a real workload.  I don't
> > know how much about it I can discuss publically, but my memory of it was a
> > system writing a log with 64 byte entries, millions of entries per second.
> > Occasionally the system would have to go back and look at an entry in the
> > last few seconds worth of data (so it would still be in the page cache).
> 
> Honestly, that should never hit any kind of contention on the page cache.
> 
> Unless they did something else odd, that load should be entirely
> serialized by the POSIX "atomic write" requirements and the
> "inode_lock(inode)"  that writes take.
> 
> So it would end up literally being just one cache miss - and if you do
> things across CPU's and have cachelines moving around, that inode lock
> would be the bigger offender in that it is the one that would see any
> contention.
> 
> Now, *that* is locking that I despise, much more than the page cache
> lock.  It serializes unrelated writes to different areas, and the
> direct-IO people instead said "we don't care about POSIX" and did
> concurrent writes without it.

We could satisfy the posix atomic writes rule by just having a properly
vectorized buffered write path, no need for the inode lock - it really
should just be extending writes that have to hit the inode lock, same as
O_DIRECT.

(whenever people bring up range locks, I keep trying to tell them - we
already have that in the form of the folio lock, if you'd just use it
properly...)