On Fri, 15 Apr 2022, Linus Torvalds wrote: > On Thu, Apr 14, 2022 at 7:13 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when > > iter_is_iovec(); in other cases, use the more natural iov_iter_zero() > > instead of copy_page_to_iter(). We would use iov_iter_zero() throughout, > > but the x86 clear_user() is not nearly so well optimized as copy to user > > (dd of 1T sparse tmpfs file takes 57 seconds rather than 44 seconds). > > Ugh. > > I've applied this patch, Phew, thanks. > but honestly, the proper course of action > should just be to improve on clear_user(). You'll find no disagreement here: we've all been saying the same. It's just that that work is yet to be done (or yet to be accepted). > > If it really is important enough that we should care about that > performance, then we just should fix clear_user(). > > It's a very odd special thing right now (at least on x86-64) using > some strange handcrafted inline asm code. > > I assume that 'rep stosb' is the fastest way to clear things on modern > CPU's that have FSRM, and then we have the usual fallbacks (ie ERMS -> > "rep stos" except for small areas, and probably that "store zeros by > hand" for older CPUs). > > Adding PeterZ and Borislav (who seem to be the last ones to have > worked on the copy and clear_page stuff respectively) and the x86 > maintainers in case somebody gets the urge to just fix this. Yes, it was exactly Borislav and PeterZ whom I first approached too, link 3 in the commit message of the patch that this one is fixing, https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@xxxxxxxxxx/ Borislav wants a thorough good patch, and I don't blame him for that! Hugh > > Because memory clearing should be faster than copying, and the thing > that makes copying fast is that FSRM and ERMS logic (the whole > "manually unrolled copy" is hopefully mostly a thing of the past and > we can consider it legacy) > > Linus