Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Mon, 26 Feb 2024 15:48:35 -0800

On Mon, 26 Feb 2024 at 14:46, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> I really haven't tested this AT ALL. I'm much too scared.

"Courage is not the absence of fear, but acting in spite of it"
         - Paddington Bear / Michal Scott

It seems to actually boot here.

That said, from a quick test with lots of threads all hammering on the
same page - I'm still not entirely convinced it makes a difference.
Sure, the kernel profile changes, but filemap_get_read_batch() wasn't
very high up in the profile to begin with.

I didn't do any actual performance testing, I just did a 64-byte pread
at offset 0 in a loop in 64 threads on my 32c/64t machine.

The cache ping-pong would be a lot more noticeable on some
multi-socket machine, of course, but I do get the feeling that this is
all optimizing for such an edge-case of an edge-case that it's all a
bit questionable.

But that patch does largely seem to work. Famous last words. It really
needs a lot more sanity checking, and that comment about probably
needing a memory barrier is still valid.

And even then there's the question about replacing the same folio in
the same spot in the xarray. I'm not convinced it is worth worrying
about in any reality we care about, but it's _technically_ all a bit
wrong.

So I'm throwing that patch over the fence to somebody that cares. I
_do_ now claim it at least kind of works.

                Linus