Re: [v3 PATCH] crypto: chacha - Add DEFINE_CHACHA_STATE macro

Martin Willi <martin@xxxxxxxxxxxxxx> · Wed, 08 Jul 2020 08:54:27 +0200

> > Also, I wonder if we shouldn't simply change the chacha code to use
> > unaligned loads for the state array, as it likely makes very little
> > difference in practice (the state is not accessed from inside the
> > round processing loop)
> 
> I am seeing a 0.25% slowdown on 1k blocks in the SSE3 code with the
> change below: [...]
> 
> AVX2 and AVX512 uses vbroadcasti128 with memory operands to load the
> state, so they don't require any changes afaik.

I agree. Moving SSE to use unaligned loads is certainly acceptable
these days. 

Some AVX functions use vpbroadcastd with u32 load granularity anyway.
Some use vbroadcasti128 that theoretically could (?) suffer somewhat
when operating on unaligned data, but it I guess it won't justify all
that alignment cruft.

Regards,
Martin