Hi Sebastian, Thomas, Take a look at the below snippet from this patch. I had previously put quite some effort into the simd_get, simd_put, simd_relax mechanism, so that the simd state could be persisted during both several calls to the same function and within long loops like below, with simd_relax existing to reenable preemption briefly if things were getting out of hand. Ard got rid of this and has moved the kernel_fpu_begin and kernel_fpu_end calls into the inner loop: On Sun, Sep 29, 2019 at 7:39 PM Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > + for (;;) { > + const size_t blocks = min_t(size_t, nblocks, > + PAGE_SIZE / BLAKE2S_BLOCK_SIZE); > + > + kernel_fpu_begin(); > + if (IS_ENABLED(CONFIG_AS_AVX512) && blake2s_use_avx512) > + blake2s_compress_avx512(state, block, blocks, inc); > + else > + blake2s_compress_avx(state, block, blocks, inc); > + kernel_fpu_end(); > + > + nblocks -= blocks; > + if (!nblocks) > + break; > + block += blocks * BLAKE2S_BLOCK_SIZE; > + } > + return true; > +} I'm wondering if on modern kernels this is actually fine and whether my simd_get/put/relax thing no longer has a good use case. Specifically, I recall last year there were a lot of patches and discussions about doing FPU register restoration lazily -- on context switch or the like. Did those land? Did the theory of action work out in the end? Regards, Jason