Re: [RFC PATCH 11/20] crypto: BLAKE2s - x86_64 implementation

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 29 Sep 2019 19:51:52 -0700

On Sun, Sep 29, 2019 at 7:42 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
>
> I had previously put quite some effort into the simd_get, simd_put,
> simd_relax mechanism, so that the simd state could be persisted during
> both several calls to the same function and within long loops like
> below, with simd_relax existing to reenable preemption briefly if
> things were getting out of hand. Ard got rid of this and has moved the
> kernel_fpu_begin and kernel_fpu_end calls into the inner loop:

Actually, that should be ok these days.

What has happened fairly recently (it got merged into 5.2 back in May)
is that we no longer do the FPU save/restore on each
kernel_fpu_begin/end.

Instead, we save it on kernel_fpu_begin(), and set a flag that it
needs to be restored when returning to user space.

So the kernel now on its own merges that FPU save/restore overhead
when you do it repeatedly.

The core change happened in

     5f409e20b794 ("x86/fpu: Defer FPU state load until return to userspace")

but there are a few commits around it for cleanups etc. The code was merged in

     8ff468c29e9a ("Merge branch 'x86-fpu-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

if you want to see the whole series.

That said, it would be _lovely_ if you or somebody else actually
double-checked that it works as expected and that the numbers bear out
the improvements.

It should be superior to the old model of manually trying to merge FPU
use regions, both from a performance angle (because it will merge much
more), but also from a code simplicity angle and the whole preemption
latency worry also basically goes away.

             Linus