On Tue, Jun 09, 2020 at 08:10:36PM -0400, Kent Overstreet wrote: > Convert generic_file_buffered_read() to get pages to read from in > batches, and then copy data to userspace from many pages at once - in > particular, we now don't touch any cachelines that might be contended > while we're in the loop to copy data to userspace. > > This is is a performance improvement on workloads that do buffered reads > with large blocksizes, and a very large performance improvement if that > file is also being accessed concurrently by different threads. Hey, you're stealing my performance improvements! Granted, I haven't got to doing performance optimisations (certainly not in this function), but this is one of the places where THP in the page cache will have a useful performance improvement. I'm not opposed to putting this in, but I may back it out as part of the THP work because the THPs will get the same performance improvements that you're seeing here with less code.