Re: [patch 02/14] tmpfs: fix regressions from wider use of ZERO_PAGE

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 15 Apr 2022 15:10:51 -0700

On Thu, Apr 14, 2022 at 7:13 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when
> iter_is_iovec(); in other cases, use the more natural iov_iter_zero()
> instead of copy_page_to_iter().  We would use iov_iter_zero() throughout,
> but the x86 clear_user() is not nearly so well optimized as copy to user
> (dd of 1T sparse tmpfs file takes 57 seconds rather than 44 seconds).

Ugh.

I've applied this patch, but honestly, the proper course of action
should just be to improve on clear_user().

If it really is important enough that we should care about that
performance, then we just should fix clear_user().

It's a very odd special thing right now (at least on x86-64) using
some strange handcrafted inline asm code.

I assume that 'rep stosb' is the fastest way to clear things on modern
CPU's that have FSRM, and then we have the usual fallbacks (ie ERMS ->
"rep stos" except for small areas, and probably that "store zeros by
hand" for older CPUs).

Adding PeterZ and Borislav (who seem to be the last ones to have
worked on the copy and clear_page stuff respectively) and the x86
maintainers in case somebody gets the urge to just fix this.

Because memory clearing should be faster than copying, and the thing
that makes copying fast is that FSRM and ERMS logic (the whole
"manually unrolled copy" is hopefully mostly a thing of the past and
we can consider it legacy)

             Linus