Re: [PATCH v3 02/29] crypto: x86/chacha - depend on generic chacha library instead of crypto driver

Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> · Tue, 15 Oct 2019 12:12:43 +0200

On Tue, 15 Oct 2019 at 12:00, Martin Willi <martin@xxxxxxxxxxxxxx> wrote:
>
> Hi Ard,
>
> > Since turning the FPU on and off is cheap these days, simplify the
> > SIMD routine by dropping the per-page yield, which makes for a
> > cleaner switch to the library API as well.
>
> In my measurements that lazy FPU restore works as intended, and I could
> not identify any slowdown by this change.
>

Thanks for confirming.

> > +++ b/arch/x86/crypto/chacha_glue.c
> > @@ -127,32 +127,32 @@ static int chacha_simd_stream_xor [...]
> >
> > +     do_simd = (walk->total > CHACHA_BLOCK_SIZE) && crypto_simd_usable();
>
> Given that most users (including chacha20poly1305) likely involve
> multiple operations under the same (real) FPU save/restore cycle, those
> length checks both in chacha and in poly1305 hardly make sense anymore.
>
> Obviously under tcrypt we get better results when engaging SIMD for any
> length, but also for real users this seems beneficial. But of course we
> may defer that to a later optimization patch.
>

Given that the description already reasons about FPU save/restore
being cheap these days, I think it would be appropriate to just get
rid of it right away. Especially in the chacha20poly1305 case, where
the separate chacha invocation for the poly nonce is guaranteed to
fail this check, we basically end up going back and forth between the
scalar and the SIMD code, which seems rather suboptimal to me.