>> * It removes the existing SSE2 code path. Most likely not that much of >> an issue due to the new AVX variant. > > It's not clear that that sse2 code is even faster than the x86_64 > scalar code in the new implementation, actually. Either way, > regardless of that, in spite of the previous sentence, I don't think > it really matters, based on the chips we care about targeting. There is remark in commentary section. SSE2 was faster on P4 and and early Core processors, but for non-Intel and contemporary non-AVX-capable processors, most notably from Atom family, scalar x86_64 *is* fastest option. As for scalar performance on legacy Intel processors, for me omitting SSE2 meant ~33% loss for oldest P4 and less for not as old ones. [Just in case, situation is naturally different on 32-bit systems. From coverage vs. performance viewpoint SSE2+AVX2 is arguably more suitable mix in 32-bit case, AVX makes lesser sense, because gain is not impressive enough in comparison to SSE2.] Cheers.