Hi Herbert, On Thu, Nov 3, 2016 at 1:49 AM, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > FWIW I'd rather live with a 6% slowdown than having two different > code paths in the generic code. Anyone who cares about 6% would > be much better off writing an assembly version of the code. Please think twice before deciding that the generic C "is allowed to be slow". It turns out to be used far more often than might be obvious. For example, crypto is commonly done on the netdev layer -- like the case with mac80211-based drivers. At this layer, the FPU on x86 isn't always available, depending on the path used. Some combinations of drivers, packet family, and workload can result in the generic C being used instead of the vectorized assembly for a massive percentage of time. So, I think we do have a good motivation for wanting the generic C to be as fast as possible. In the particular case of poly1305, these are the only spots where unaligned accesses take place, and they're rather small, and I think it's pretty obvious what's happening in the two different cases of code from a quick glance. This isn't the "two different paths case" in which there's a significant future-facing maintenance burden. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html