Hi Eric, 2017-01-13 4:19 GMT+01:00 Eric Biggers <ebiggers3@xxxxxxxxx>: > To what extent does the performance benefit of this patchset result from just > the reduced numbers of calls to kernel_fpu_begin() and kernel_fpu_end()? > > If it's most of the benefit, would it make any sense to optimize > kernel_fpu_begin() and kernel_fpu_end() instead? > > And if there are other examples besides kernel_fpu_begin/kernel_fpu_end where > the bulk API would provide a significant performance boost, can you mention > them? In the case of AES-NI ciphers, this is the only benefit. However, this change is not intended solely (or primarily) for AES-NI ciphers, but also for other drivers that have a high per-request overhead. This patchset is in fact a reaction to Binoy Jayan's efforts (see [1]). The problem with small requests to HW crypto drivers comes up for example in Qualcomm's Android [2], where they actually hacked together their own version of dm-crypt (called 'dm-req-crypt'), which in turn used a driver-specific crypto mode, which does the IV generation on its own, and thereby is able to process several sectors at once. The goal is to extend the crypto API so that vendors don't have to roll out their own workarounds to have efficient disk encryption. > Interestingly, the arm64 equivalent to kernel_fpu_begin() > (kernel_neon_begin_partial() in arch/arm64/kernel/fpsimd.c) appears to have an > optimization where the SIMD registers aren't saved if they were already saved. > I wonder why something similar isn't done on x86. AFAIK, there can't be done much about the kernel_fpu_* functions, see e.g. [3]. Regards, Ondrej [1] https://lkml.org/lkml/2016/12/20/111 [2] https://nelenkov.blogspot.com/2015/05/hardware-accelerated-disk-encryption-in.html [3] https://lkml.org/lkml/2016/12/21/354 > > Eric -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel