On 2/25/25 14:59, Eric Biggers wrote: > If we had to save/restore a large number of vector registers in every crypto > function call (not amortized to one save/restore per return to userspace), that > would be a big performance problem. I just did a quick trace on my laptop. Looks like I have two main kernel_fpu_begin() users: LUKS and networking. They both very much seem to do a bunch of kernel_fpu_begin() operations but very few actual XSAVEs: 26 : save_fpregs_to_fpstate <-kernel_fpu_begin_mask 818 : kernel_fpu_begin_mask <-crc32c_pcl_intel_update 4192 : kernel_fpu_begin_mask <-xts_encrypt_vaes_avx10_256 This is at least _one_ data point very much in favor of Eric's argument here. It appears that that the cost of one XSAVE is amortized across a bunch of kernel_fpu_begin()s.