On Thu, Apr 14, 2022 at 7:13 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when > iter_is_iovec(); in other cases, use the more natural iov_iter_zero() > instead of copy_page_to_iter(). We would use iov_iter_zero() throughout, > but the x86 clear_user() is not nearly so well optimized as copy to user > (dd of 1T sparse tmpfs file takes 57 seconds rather than 44 seconds). Ugh. I've applied this patch, but honestly, the proper course of action should just be to improve on clear_user(). If it really is important enough that we should care about that performance, then we just should fix clear_user(). It's a very odd special thing right now (at least on x86-64) using some strange handcrafted inline asm code. I assume that 'rep stosb' is the fastest way to clear things on modern CPU's that have FSRM, and then we have the usual fallbacks (ie ERMS -> "rep stos" except for small areas, and probably that "store zeros by hand" for older CPUs). Adding PeterZ and Borislav (who seem to be the last ones to have worked on the copy and clear_page stuff respectively) and the x86 maintainers in case somebody gets the urge to just fix this. Because memory clearing should be faster than copying, and the thing that makes copying fast is that FSRM and ERMS logic (the whole "manually unrolled copy" is hopefully mostly a thing of the past and we can consider it legacy) Linus