On Mon, 30 Nov 2020 at 23:48, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote: > > On 11/29/20 10:20 AM, Ard Biesheuvel wrote: > > From: Steve deRosier <ardb@xxxxxxxxxx> > > > > Add ccm(aes) implementation from linux-wireless mailing list (see > > http://permalink.gmane.org/gmane.linux.kernel.wireless.general/126679). > > > > This eliminates FPU context store/restore overhead existing in more > > general ccm_base(ctr(aes-aesni),aes-aesni) case in MAC calculation. > > > > Suggested-by: Ben Greear <greearb@xxxxxxxxxxxxxxx> > > Co-developed-by: Steve deRosier <derosier@xxxxxxxxxxxxxx> > > Signed-off-by: Steve deRosier <derosier@xxxxxxxxxxxxxx> > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > > --- > > Ben, > > > > This is almost a rewrite of the original patch, switching to the new > > skcipher API, using the existing SIMD helper, and drop numerous unrelated > > changes. The basic approach is almost identical, though, so I expect this > > to perform on par or perhaps slightly faster than the original. > > > > Could you please confirm with some numbers? > > I tried this on my apu2 platform, here is perf top during a TCP download using > rx-sw-crypt (ie, the aesni cpu decrypt path): > > 18.77% [kernel] [k] acpi_idle_enter > 14.68% [kernel] [k] kernel_fpu_begin > 4.45% [kernel] [k] __crypto_xor > 3.46% [kernel] [k] _aesni_enc1 > > Total throughput is 127Mbps or so. This is with your patch applied to 5.8.0+ > kernel (it applied clean with 'git am') > > Is there a good way to verify at runtime that I've properly applied your patch? > > On my 5.4 kernel with the old version of the patch installed, I see 253Mbps throughput, > and perf-top shows: > > 13.33% [kernel] [k] acpi_idle_do_entry > 9.21% [kernel] [k] _aesni_enc1 > 4.49% [unknown] [.] 0x00007fbc3f00adb6 > 4.34% [unknown] [.] 0x00007fbc3f00adba > 3.85% [kernel] [k] memcpy > > > So, new patch is not working that well for me... > That is odd. The net number of invocations of kernel_fpu_begin() should be the same, so I cannot explain why they suddenly take more time. I am starting to think that this is a different issue altogether. One thing that you could try is dropping the '.cra_alignmask' line as we don't actually need it, but I am skeptical that this is the cause of this.