Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Sun, 25 Feb 2024 21:14:11 +0000

On Sun, Feb 25, 2024 at 09:03:32AM -0800, Linus Torvalds wrote:
> I think you've been staring at profiles too much. In instruction-level
> profiles, the atomic ops stand out a lot. But that's at least partly
> artificial - they are a serialization point on x86, so things get
> accounted to them. So they tend to be the collection point for
> everything around them in an OoO CPU.
> 
> Yes, atomics are bad. But double buffering is worse, and only looks
> good if you have some artificial benchmark that does some single-byte
> hot-cache read in a loop.

Not artificial; this was a real customer with a real workload.  I don't
know how much about it I can discuss publically, but my memory of it was a
system writing a log with 64 byte entries, millions of entries per second.
Occasionally the system would have to go back and look at an entry in the
last few seconds worth of data (so it would still be in the page cache).

This customer was quite savvy, so they actually implemented and tested
the lookup-copy-lookup-again algorithm in their custom kernel, and saw
a speedup from it.