On Thu, Jan 12, 2017 at 01:59:57PM +0100, Ondrej Mosnacek wrote: > This patch implements bulk request handling in the AES-NI crypto drivers. > The major advantage of this is that with bulk requests, the kernel_fpu_* > functions (which are usually quite slow) are now called only once for the whole > request. > Hi Ondrej, To what extent does the performance benefit of this patchset result from just the reduced numbers of calls to kernel_fpu_begin() and kernel_fpu_end()? If it's most of the benefit, would it make any sense to optimize kernel_fpu_begin() and kernel_fpu_end() instead? And if there are other examples besides kernel_fpu_begin/kernel_fpu_end where the bulk API would provide a significant performance boost, can you mention them? Interestingly, the arm64 equivalent to kernel_fpu_begin() (kernel_neon_begin_partial() in arch/arm64/kernel/fpsimd.c) appears to have an optimization where the SIMD registers aren't saved if they were already saved. I wonder why something similar isn't done on x86. Eric -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel