On Mon, 26 Feb 2024 at 14:46, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > I really haven't tested this AT ALL. I'm much too scared. "Courage is not the absence of fear, but acting in spite of it" - Paddington Bear / Michal Scott It seems to actually boot here. That said, from a quick test with lots of threads all hammering on the same page - I'm still not entirely convinced it makes a difference. Sure, the kernel profile changes, but filemap_get_read_batch() wasn't very high up in the profile to begin with. I didn't do any actual performance testing, I just did a 64-byte pread at offset 0 in a loop in 64 threads on my 32c/64t machine. The cache ping-pong would be a lot more noticeable on some multi-socket machine, of course, but I do get the feeling that this is all optimizing for such an edge-case of an edge-case that it's all a bit questionable. But that patch does largely seem to work. Famous last words. It really needs a lot more sanity checking, and that comment about probably needing a memory barrier is still valid. And even then there's the question about replacing the same folio in the same spot in the xarray. I'm not convinced it is worth worrying about in any reality we care about, but it's _technically_ all a bit wrong. So I'm throwing that patch over the fence to somebody that cares. I _do_ now claim it at least kind of works. Linus