From: Eric Biggers > Sent: 26 March 2024 16:48 .... > Consider Intel Ice Lake for example, these are the AES-256-XTS encryption speeds > on 4096-byte messages in MB/s I'm seeing: > > xts-aes-aesni 5136 > xts-aes-aesni-avx 5366 > xts-aes-vaes-avx2 9337 > xts-aes-vaes-avx10_256 9876 > xts-aes-vaes-avx10_512 10215 > > So yes, on that CPU the biggest boost comes just from VAES, staying on AVX2. > But taking advantage of AVX512 does help a bit more, first from the parts other > than 512-bit registers, then a bit more from 512-bit registers. How much does the kernel_fpu_begin() cost on real workloads? (ie when the registers are live and it forces an extra save/restore) I've not looked at the code but I often see what looks like excessive inlining in crypto code. This will speed up benchmarks but can have a negative effect on real code both because of the time taken to load the code and the effect of displacing other code. It might be that this code is a simple loop.... David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)