Re: [PATCH crypto-next v2 2/3] crypto: x86_64/poly1305 - add faster implementations

Martin Willi <martin@xxxxxxxxxxxxxx> · Thu, 12 Dec 2019 16:34:34 +0100

> These x86_64 vectorized implementations are based on Andy Polyakov's
> implementation, and support AVX, AVX-2, and AVX512F. The AVX-512F
> implementation is disabled on Skylake, due to throttling, but it is
> quite fast on >= Cannonlake.

>  arch/x86/crypto/poly1305-avx2-x86_64.S |  390 ---
>  arch/x86/crypto/poly1305-sse2-x86_64.S |  590 ----
>  arch/x86/crypto/poly1305-x86_64.pl     | 4266 ++++++++++++++++++++++++

As the author of the removed code, I'm certainly biased, so I won't
hinder the adaption of the new code. Nonetheless some final remarks
from my side:

 * It removes the existing SSE2 code path. Most likely not that much of
   an issue due to the new AVX variant.
 * I certainly would favor gradual improvement, and I think the code
   would allow it. But as said, not my pick.
 * Those 4000+ lines perl/asm are a lot and a hard review; I won't find
   time and motivation to do it. ;-)

Thanks!
Martin