On Sun, Feb 25, 2024 at 03:45:47PM -0800, Linus Torvalds wrote: > On Sun, 25 Feb 2024 at 13:14, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > Not artificial; this was a real customer with a real workload. I don't > > know how much about it I can discuss publically, but my memory of it was a > > system writing a log with 64 byte entries, millions of entries per second. > > Occasionally the system would have to go back and look at an entry in the > > last few seconds worth of data (so it would still be in the page cache). > > Honestly, that should never hit any kind of contention on the page cache. > > Unless they did something else odd, that load should be entirely > serialized by the POSIX "atomic write" requirements and the > "inode_lock(inode)" that writes take. > > So it would end up literally being just one cache miss - and if you do > things across CPU's and have cachelines moving around, that inode lock > would be the bigger offender in that it is the one that would see any > contention. > > Now, *that* is locking that I despise, much more than the page cache > lock. It serializes unrelated writes to different areas, and the > direct-IO people instead said "we don't care about POSIX" and did > concurrent writes without it. We could satisfy the posix atomic writes rule by just having a properly vectorized buffered write path, no need for the inode lock - it really should just be extending writes that have to hit the inode lock, same as O_DIRECT. (whenever people bring up range locks, I keep trying to tell them - we already have that in the form of the folio lock, if you'd just use it properly...)