Re: [PATCH] crypto: x86/aesni - implement accelerated CBCMAC, CMAC and XCBC shashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 4 Aug 2020 at 21:45, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
>
> On 8/4/20 6:08 AM, Ard Biesheuvel wrote:
> > On Tue, 4 Aug 2020 at 15:01, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
> >>
> >> On 8/4/20 5:55 AM, Ard Biesheuvel wrote:
> >>> On Mon, 3 Aug 2020 at 21:11, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> This helps a bit...now download sw-crypt performance is about 150Mbps,
> >>>> but still not as good as with my patch on 5.4 kernel, and fpu is still
> >>>> high in perf top:
> >>>>
> >>>>       13.89%  libc-2.29.so   [.] __memset_sse2_unaligned_erms
> >>>>         6.62%  [kernel]       [k] kernel_fpu_begin
> >>>>         4.14%  [kernel]       [k] _aesni_enc1
> >>>>         2.06%  [kernel]       [k] __crypto_xor
> >>>>         1.95%  [kernel]       [k] copy_user_generic_string
> >>>>         1.93%  libjvm.so      [.] SpinPause
> >>>>         1.01%  [kernel]       [k] aesni_encrypt
> >>>>         0.98%  [kernel]       [k] crypto_ctr_crypt
> >>>>         0.93%  [kernel]       [k] udp_sendmsg
> >>>>         0.78%  [kernel]       [k] crypto_inc
> >>>>         0.74%  [kernel]       [k] __ip_append_data.isra.53
> >>>>         0.65%  [kernel]       [k] aesni_cbc_enc
> >>>>         0.64%  [kernel]       [k] __dev_queue_xmit
> >>>>         0.62%  [kernel]       [k] ipt_do_table
> >>>>         0.62%  [kernel]       [k] igb_xmit_frame_ring
> >>>>         0.59%  [kernel]       [k] ip_route_output_key_hash_rcu
> >>>>         0.57%  [kernel]       [k] memcpy
> >>>>         0.57%  libjvm.so      [.] InstanceKlass::oop_follow_contents
> >>>>         0.56%  [kernel]       [k] irq_fpu_usable
> >>>>         0.56%  [kernel]       [k] mac_do_update
> >>>>
> >>>> If you'd like help setting up a test rig and have an ath10k pcie NIC or ath9k pcie NIC,
> >>>> then I can help.  Possibly hwsim would also be a good test case, but I have not tried
> >>>> that.
> >>>>
> >>>
> >>> I don't think this is likely to be reproducible on other
> >>> micro-architectures, so setting up a test rig is unlikely to help.
> >>>
> >>> I'll send out a v2 which implements a ahash instead of a shash (and
> >>> implements some other tweaks) so that kernel_fpu_begin() is only
> >>> called twice for each packet on the cbcmac path.
> >>>
> >>> Do you have any numbers for the old kernel without your patch? This
> >>> pathological FPU preserve/restore behavior could be caused be the
> >>> optimizations, or by other changes that landed in the meantime, so I
> >>> would like to know if kernel_fpu_begin() is as prominent in those
> >>> traces as well.
> >>>
> >>
> >> This same patch makes i7 mobile processors able to handle 1Gbps+ software
> >> decrypt rates, where without the patch, the rate was badly constrained and CPU
> >> load was much higher, so it is definitely noticeable on other processors too.
> >
> > OK
> >
> >> The weak processor on the current test rig is convenient because the problem
> >> is so noticeable even at slower wifi speeds.
> >>
> >> We can do some tests on 5.4 with our patch reverted.
> >>
> >
> > The issue with your CCM patch is that it keeps the FPU enabled for the
> > entire input, which also means that preemption is disabled, which
> > makes the -rt people grumpy. (Of course, it also uses APIs that no
> > longer exists, but that should be easy to fix)
> >
> > Do you happen to have any ballpark figures for the packet sizes and
> > the time spent doing encryption?
> >
>
> My tester reports this last patch appears to break wpa-2 entirely, so we
> cannot test it as is.
>

Ah, that is unfortunate. It passed the internal selftests we have in
the kernel, but apparently, the coverage is not 100%.

I will take another look into this next week.



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux