> These x86_64 vectorized implementations are based on Andy Polyakov's > implementation, and support AVX, AVX-2, and AVX512F. The AVX-512F > implementation is disabled on Skylake, due to throttling, but it is > quite fast on >= Cannonlake. > arch/x86/crypto/poly1305-avx2-x86_64.S | 390 --- > arch/x86/crypto/poly1305-sse2-x86_64.S | 590 ---- > arch/x86/crypto/poly1305-x86_64.pl | 4266 ++++++++++++++++++++++++ As the author of the removed code, I'm certainly biased, so I won't hinder the adaption of the new code. Nonetheless some final remarks from my side: * It removes the existing SSE2 code path. Most likely not that much of an issue due to the new AVX variant. * I certainly would favor gradual improvement, and I think the code would allow it. But as said, not my pick. * Those 4000+ lines perl/asm are a lot and a hard review; I won't find time and motivation to do it. ;-) Thanks! Martin