Re: [RFC PATCH 11/20] crypto: BLAKE2s - x86_64 implementation

"Jason A. Donenfeld" <Jason@xxxxxxxxx> · Mon, 30 Sep 2019 04:42:06 +0200

Hi Sebastian, Thomas,

Take a look at the below snippet from this patch.

I had previously put quite some effort into the simd_get, simd_put,
simd_relax mechanism, so that the simd state could be persisted during
both several calls to the same function and within long loops like
below, with simd_relax existing to reenable preemption briefly if
things were getting out of hand. Ard got rid of this and has moved the
kernel_fpu_begin and kernel_fpu_end calls into the inner loop:

On Sun, Sep 29, 2019 at 7:39 PM Ard Biesheuvel
<ard.biesheuvel@xxxxxxxxxx> wrote:
> +       for (;;) {
> +               const size_t blocks = min_t(size_t, nblocks,
> +                                           PAGE_SIZE / BLAKE2S_BLOCK_SIZE);
> +
> +               kernel_fpu_begin();
> +               if (IS_ENABLED(CONFIG_AS_AVX512) && blake2s_use_avx512)
> +                       blake2s_compress_avx512(state, block, blocks, inc);
> +               else
> +                       blake2s_compress_avx(state, block, blocks, inc);
> +               kernel_fpu_end();
> +
> +               nblocks -= blocks;
> +               if (!nblocks)
> +                       break;
> +               block += blocks * BLAKE2S_BLOCK_SIZE;
> +       }
> +       return true;
> +}

I'm wondering if on modern kernels this is actually fine and whether
my simd_get/put/relax thing no longer has a good use case.
Specifically, I recall last year there were a lot of patches and
discussions about doing FPU register restoration lazily -- on context
switch or the like. Did those land? Did the theory of action work out
in the end?

Regards,
Jason