Re: [PATCH] x86/clear_user: Make it faster

Ingo Molnar <mingo@xxxxxxxxxx> · Fri, 27 May 2022 13:10:47 +0200

* Borislav Petkov <bp@xxxxxxxxx> wrote:

> Ok,
> 
> finally a somewhat final version, lightly tested.
> 
> I still need to run it on production Icelake and that is kinda being
> delayed due to server room cooling issues (don't ask ;-\).

> So Mel gave me the idea to simply measure how fast the function becomes.
> I.e.:
> 
>   start = rdtsc_ordered();
>   ret = __clear_user(to, n);
>   end = rdtsc_ordered();
> 
> Computing the mean average of all the samples collected during the test
> suite run then shows some improvement:
> 
>   clear_user_original:
>   Amean: 9219.71 (Sum: 6340154910, samples: 687674)
> 
>   fsrm:
>   Amean: 8030.63 (Sum: 5522277720, samples: 687652)
> 
> That's on Zen3.

As a side note, there's some rudimentary perf tooling that allows the 
user-space testing of kernel-space x86 memcpy and memset implementations:

 $ perf bench mem memcpy
 # Running 'mem/memcpy' benchmark:
 # function 'default' (Default memcpy() provided by glibc)
 # Copying 1MB bytes ...

       42.459239 GB/sec
 # function 'x86-64-unrolled' (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
 # Copying 1MB bytes ...

       23.818598 GB/sec
 # function 'x86-64-movsq' (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
 # Copying 1MB bytes ...

       10.172526 GB/sec
 # function 'x86-64-movsb' (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
 # Copying 1MB bytes ...

       10.614810 GB/sec

Note how the actual implementation in arch/x86/lib/memcpy_64.S was used to 
build a user-space test into 'perf bench'.

For copy_user() & clear_user() some additional wrappery would be needed I 
guess, to wrap away stac()/clac()/might_sleep(), etc. ...

[ Plus it could all be improved to measure cache hot & cache cold 
  performance, to use different sizes, etc. ]

Even with the limitation that it's not 100% equivalent to the kernel-space 
thing, especially for very short buffers, having the whole perf side 
benchmarking, profiling & statistics machinery available is a plus I think.

Thanks,

	Ingo