On Sun, 25 Feb 2024 at 13:14, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > Not artificial; this was a real customer with a real workload. I don't > know how much about it I can discuss publically, but my memory of it was a > system writing a log with 64 byte entries, millions of entries per second. > Occasionally the system would have to go back and look at an entry in the > last few seconds worth of data (so it would still be in the page cache). Honestly, that should never hit any kind of contention on the page cache. Unless they did something else odd, that load should be entirely serialized by the POSIX "atomic write" requirements and the "inode_lock(inode)" that writes take. So it would end up literally being just one cache miss - and if you do things across CPU's and have cachelines moving around, that inode lock would be the bigger offender in that it is the one that would see any contention. Now, *that* is locking that I despise, much more than the page cache lock. It serializes unrelated writes to different areas, and the direct-IO people instead said "we don't care about POSIX" and did concurrent writes without it. That said, I do wonder if we could take advantage of the fact that we have the inode lock, and just make page eviction take that lock too (possibly in shared form). At that point, you really could just say "no need to increment the reference count, because we can do writes knowing that the mapping pages are stable". Not pretty, but we could possibly at least take advantage of the horrid other ugliness of the inode locking and POSIX rules that nobody really wants. Linus