On Fri, Mar 29, 2024 at 02:31:30AM -0700, Eric Biggers wrote: > > I wouldn't mind retiring the existing xts(aesni) > > code entirely, and using the xts() wrapper around ecb-aes-aesni on > > 32-bit and on non-AVX uarchs with AES-NI. > > Yes, it will need to be benchmarked, but that probably makes sense. If > Wikipedia is to be trusted, on the Intel side only Westmere (from 2010) has > AES-NI but not AVX, and on the AMD side all CPUs with AES-NI have AVX... It looks like I missed some low-power CPUs. Intel's Silvermont (2013), Goldmont (2016), Goldmont Plus (2017), and Tremont (2020) support AES-NI but not AVX. Their successor, Gracemont (2021), supports AVX. I don't have any one of those immediately available to run a test on. But just doing a quick benchmark on Zen 1, xts-aes-aesni has 62% higher throughput than xts(ecb-aes-aesni). The significant difference seems expected, since there's a lot of API overhead in the xts template, and it computes all the tweaks twice in C code. So I'm thinking we'll need to keep xts-aes-aesni around for now, alongside xts-aes-aesni-avx. (And with all the SIMD instructions taking a different number of arguments and having different names for AVX vs non-AVX, I don't see a clean way to unify them in assembly. They could be unified if we used C intrinsics instead of assembly and compiled a C function with and without the "avx" target. However, intrinsics bring their own issues and make it hard to control the generated code. I don't really want to rely on intrinsics for this code.) - Eric