Re: [PATCH crypto-next v2 2/3] crypto: x86_64/poly1305 - add faster implementations

Andy Polyakov <appro@xxxxxxxxxxx> · Sun, 15 Dec 2019 18:04:08 +0100

>>  * It removes the existing SSE2 code path. Most likely not that much of
>>    an issue due to the new AVX variant.
> 
> It's not clear that that sse2 code is even faster than the x86_64
> scalar code in the new implementation, actually. Either way,
> regardless of that, in spite of the previous sentence, I don't think
> it really matters, based on the chips we care about targeting.

There is remark in commentary section. SSE2 was faster on P4 and and
early Core processors, but for non-Intel and contemporary
non-AVX-capable processors, most notably from Atom family, scalar x86_64
*is* fastest option. As for scalar performance on legacy Intel
processors, for me omitting SSE2 meant ~33% loss for oldest P4 and less
for not as old ones. [Just in case, situation is naturally different on
32-bit systems. From coverage vs. performance viewpoint SSE2+AVX2 is
arguably more suitable mix in 32-bit case, AVX makes lesser sense,
because gain is not impressive enough in comparison to SSE2.]

Cheers.