Hi Ard, > Since turning the FPU on and off is cheap these days, simplify the > SIMD routine by dropping the per-page yield, which makes for a > cleaner switch to the library API as well. In my measurements that lazy FPU restore works as intended, and I could not identify any slowdown by this change. > +++ b/arch/x86/crypto/chacha_glue.c > @@ -127,32 +127,32 @@ static int chacha_simd_stream_xor [...] > > + do_simd = (walk->total > CHACHA_BLOCK_SIZE) && crypto_simd_usable(); Given that most users (including chacha20poly1305) likely involve multiple operations under the same (real) FPU save/restore cycle, those length checks both in chacha and in poly1305 hardly make sense anymore. Obviously under tcrypt we get better results when engaging SIMD for any length, but also for real users this seems beneficial. But of course we may defer that to a later optimization patch. Thanks, Martin