On 2019-09-30 04:42:06 [+0200], Jason A. Donenfeld wrote: > Hi Sebastian, Thomas, Hi Jason, > On Sun, Sep 29, 2019 at 7:39 PM Ard Biesheuvel > <ard.biesheuvel@xxxxxxxxxx> wrote: > > + for (;;) { > > + const size_t blocks = min_t(size_t, nblocks, > > + PAGE_SIZE / BLAKE2S_BLOCK_SIZE); > > + > > + kernel_fpu_begin(); > > + if (IS_ENABLED(CONFIG_AS_AVX512) && blake2s_use_avx512) > > + blake2s_compress_avx512(state, block, blocks, inc); > > + else > > + blake2s_compress_avx(state, block, blocks, inc); > > + kernel_fpu_end(); > > + > > + nblocks -= blocks; > > + if (!nblocks) > > + break; > > + block += blocks * BLAKE2S_BLOCK_SIZE; > > + } > > + return true; > > +} > > I'm wondering if on modern kernels this is actually fine and whether > my simd_get/put/relax thing no longer has a good use case. > Specifically, I recall last year there were a lot of patches and > discussions about doing FPU register restoration lazily -- on context > switch or the like. Did those land? Did the theory of action work out > in the end? That optimisation landed in v5.2. With that change (on x86) kernel_fpu_end() does almost nothing (and so the following kernel_fpu_begin()) and the FPU context will be restored once the task returns to user land. (Note that this counts only for user-tasks because we don't save the FPU state of a kernel thread.) I haven't look at crypto code since that change was merged but looking at the snippet from Ard is actually what I was aiming for. > Regards, > Jason Sebastian