On 5/23/22 9:49 AM, Matthew Wilcox wrote: > On Mon, May 23, 2022 at 09:44:12AM -0600, Jens Axboe wrote: >> On 5/23/22 9:12 AM, Jens Axboe wrote: >>>> Current branch pushed to #new.iov_iter (at the moment; will rename >>>> back to work.iov_iter once it gets more or less stable). >>> >>> Sounds good, I'll see what I need to rebase. >> >> On the previous branch, ran a few quick numbers. dd from /dev/zero to >> /dev/null, with /dev/zero using ->read() as it does by default: >> >> 32 260MB/sec >> 1k 6.6GB/sec >> 4k 17.9GB/sec >> 16k 28.8GB/sec >> >> now comment out ->read() so it uses ->read_iter() instead: >> >> 32 259MB/sec >> 1k 6.6GB/sec >> 4k 18.0GB/sec >> 16k 28.6GB/sec >> >> which are roughly identical, all things considered. Just a sanity check, >> but looks good from a performance POV in this basic test. >> >> Now let's do ->read_iter() but make iov_iter_zero() copy from the zero >> page instead: >> >> 32 250MB/sec >> 1k 7.7GB/sec >> 4k 28.8GB/sec >> 16k 71.2GB/sec >> >> Looks like it's a tad slower for 32-bytes, considerably better for 1k, >> and massively better at page size and above. This is on an Intel 12900K, >> so recent CPU. Let's try cacheline and above: >> >> Size Method BW >> 64 copy_from_zero() 508MB/sec >> 128 copy_from_zero() 1.0GB/sec >> 64 clear_user() 513MB/sec >> 128 clear_user() 1.0GB/sec > > See this thread-of-doom: > > https://lore.kernel.org/all/Ynq1nVpu1xCpjnXm@xxxxxxx/ Yikes, will check. Also realized that my > PAGE_SIZE 16k result is invalid as it should iterate with min(left, PAGE_SIZE). But that just changes it from 71.2GB/sec to 68.9MB/sec, still a substantial win. Updated diff below. Was going to test on aarch64 next... -- Jens Axboe