On Mon, Feb 26, 2024 at 03:48:35PM -0800, Linus Torvalds wrote: > On Mon, 26 Feb 2024 at 14:46, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > I really haven't tested this AT ALL. I'm much too scared. > > "Courage is not the absence of fear, but acting in spite of it" > - Paddington Bear / Michal Scott > > It seems to actually boot here. > > That said, from a quick test with lots of threads all hammering on the > same page - I'm still not entirely convinced it makes a difference. > Sure, the kernel profile changes, but filemap_get_read_batch() wasn't > very high up in the profile to begin with. > > I didn't do any actual performance testing, I just did a 64-byte pread > at offset 0 in a loop in 64 threads on my 32c/64t machine. Only rough testing, but this is looking like around a 25% performance increase doing 4k random reads on a 1G file with fio, 8 jobs, on my Ryzen 5950x - 16.7M -> 21.4M iops, very roughly. fio's a pig and we're only spending half our cpu time in the kernel, so the buffered read path is actually getting 40% or 50% faster. So I'd say that's substantial. RCU freeing of pagecache pages would be even better - I think that'd let us completely get rid of the barrier & xarray recheck, and we wouldn't have to do it as a silly special case.