Hi Jason, > I'd be inclined to roll with your implementation if it can eventually > become competitive with Andy Polyakov's, [...] I think for the SSSE3/AVX2 code paths it is competitive; especially for small sizes it is faster, which is not that unimportant when implementing layer 3 VPNs. > there are still no AVX-512 paths, which means it's considerably > slower on all newer generation Intel chips. Andy's has the AVX-512VL > implementation for Skylake (using ymm, so as not to hit throttling) > and AVX-512F for Cannon Lake and beyond (using zmm). I don't think that having AVX-512F is that important until it is really usable on CPUs in the market. Adding AVX-512VL support is relatively simple. I have a patchset mostly ready that is more than competitive with the code from Zinc. I'll clean that up and do more testing before posting it later this week. Best regards Martin