On Fri, Apr 15, 2022 at 03:10:51PM -0700, Linus Torvalds wrote: > On Thu, Apr 14, 2022 at 7:13 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when > > iter_is_iovec(); in other cases, use the more natural iov_iter_zero() > > instead of copy_page_to_iter(). We would use iov_iter_zero() throughout, > > but the x86 clear_user() is not nearly so well optimized as copy to user > > (dd of 1T sparse tmpfs file takes 57 seconds rather than 44 seconds). > > Ugh. > > I've applied this patch, but honestly, the proper course of action > should just be to improve on clear_user(). > > If it really is important enough that we should care about that > performance, then we just should fix clear_user(). > > It's a very odd special thing right now (at least on x86-64) using > some strange handcrafted inline asm code. > > I assume that 'rep stosb' is the fastest way to clear things on modern > CPU's that have FSRM, and then we have the usual fallbacks (ie ERMS -> > "rep stos" except for small areas, and probably that "store zeros by > hand" for older CPUs). > > Adding PeterZ and Borislav (who seem to be the last ones to have > worked on the copy and clear_page stuff respectively) and the x86 > maintainers in case somebody gets the urge to just fix this. Perhaps the x86 maintainers would like to start from https://lore.kernel.org/lkml/20210523180423.108087-1-sneves@xxxxxxxxx/ instead of pushing that work off on the submitter.