On Tue, 15 Oct 2019 at 12:00, Martin Willi <martin@xxxxxxxxxxxxxx> wrote: > > Hi Ard, > > > Since turning the FPU on and off is cheap these days, simplify the > > SIMD routine by dropping the per-page yield, which makes for a > > cleaner switch to the library API as well. > > In my measurements that lazy FPU restore works as intended, and I could > not identify any slowdown by this change. > Thanks for confirming. > > +++ b/arch/x86/crypto/chacha_glue.c > > @@ -127,32 +127,32 @@ static int chacha_simd_stream_xor [...] > > > > + do_simd = (walk->total > CHACHA_BLOCK_SIZE) && crypto_simd_usable(); > > Given that most users (including chacha20poly1305) likely involve > multiple operations under the same (real) FPU save/restore cycle, those > length checks both in chacha and in poly1305 hardly make sense anymore. > > Obviously under tcrypt we get better results when engaging SIMD for any > length, but also for real users this seems beneficial. But of course we > may defer that to a later optimization patch. > Given that the description already reasons about FPU save/restore being cheap these days, I think it would be appropriate to just get rid of it right away. Especially in the chacha20poly1305 case, where the separate chacha invocation for the poly nonce is guaranteed to fail this check, we basically end up going back and forth between the scalar and the SIMD code, which seems rather suboptimal to me.