From: Eric Biggers > Sent: 05 April 2024 20:19 ... > I did some tests on Sapphire Rapids using a system call that I customized to do > nothing except possibly a kernel_fpu_begin / kernel_fpu_end pair. > > On average the bare syscall took 70 ns. The syscall with the kernel_fpu_begin / > kernel_fpu_end pair took 160 ns if the userspace program used xmm only, 340 ns > if it used ymm, or 360 ns if it used zmm... > > Note that without the kernel_fpu_begin / kernel_fpu_end pair, AES-NI > instructions cannot be used and the alternative would be xts(ecb(aes-generic)). > On the same CPU, encrypting a single 512-byte sector with xts(ecb(aes-generic)) > takes about 2235ns. With xts-aes-vaes-avx10_512 it takes 75 ns... So most of the cost of a single 512-byte sector is the kernel_fpu_begin(). But it is so much slower any other way it is still faster. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)