Re: Help getting aesni crypto patch upstream

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 31 Jul 2020 at 01:57, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
>
> On 7/29/20 1:06 PM, Ard Biesheuvel wrote:
> > On Wed, 29 Jul 2020 at 22:29, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
> >>
> >> On 7/29/20 12:09 PM, Ard Biesheuvel wrote:
> >>> On Wed, 29 Jul 2020 at 15:27, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> On 7/28/20 11:06 PM, Ard Biesheuvel wrote:
> >>>>> On Wed, 29 Jul 2020 at 01:03, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> As part of my wifi test tool, I need to do decrypt AES on the CPU, and the only way this
> >>>>>> performs well is to use aesni.  I've been using a patch for years that does this, but
> >>>>>> recently somewhere between 5.4 and 5.7, the API I've been using has been removed.
> >>>>>>
> >>>>>> Would anyone be interested in getting this support upstream?  I'd be happy to pay for
> >>>>>> the effort.
> >>>>>>
> >>>>>> Here is the patch in question:
> >>>>>>
> >>>>>> https://github.com/greearb/linux-ct-5.7/blob/master/wip/0001-crypto-aesni-add-ccm-aes-algorithm-implementation.patch
> >>>>>>
> >>>>>> Please keep me in CC, I'm not subscribed to this list.
> >>>>>>
> >>>>>
> >>>>> Hi Ben,
> >>>>>
> >>>>> Recently, the x86 FPU handling was improved to remove the overhead of
> >>>>> preserving/restoring of the register state, so the issue that this
> >>>>> patch fixes may no longer exist. Did you try?
> >>>>>
> >>>>> In any case, according to the commit log on that patch, the problem is
> >>>>> in the MAC generation, so it might be better to add a cbcmac(aes)
> >>>>> implementation only, and not duplicate all the CCM boilerplate.
> >>>>>
> >>>>
> >>>> Hello,
> >>>>
> >>>> I don't know all of the details, and do not understand the crypto subsystem,
> >>>> but I am pretty sure that I need at least some of this patch.
> >>>>
> >>>
> >>> Whether this is true is what I am trying to get clarified.
> >>>
> >>> Your patch works around a performance bottleneck related to the use of
> >>> AES-NI instructions in the kernel, which has been addressed recently.
> >>> If the issue still exists, we can attempt to devise a fix for it,
> >>> which may or may not be based on this patch.
> >>
> >> Ok, I can do the testing.  Do you expect 5.7-stable has all the needed
> >> performance improvements?
> >>
> >
> > Yes.
>
> It does not, as far as we can tell.
>
> We did a download test on an apu2 (small embedded AMD CPU, but with
> aesni support).  A WiFi station is in software-decrypt mode (ath10k-ct driver/firmware,
> but ath9k would be valid to reproduce the issue as well.)
>
> On our 5.4 kernel with the aesni patch applied, we get
> about 220Mbps wpa2 download throughput.  With open, we get about 260Mbps
> download throughput.
>
> On 5.7, without any aesni patch, we see about 116Mbps download wpa2 throughput,
> and about 265Mbps open download throughput.
>

Thanks for the excellent data. Apparently, FPU preserve/restore is
still prohibitively expensive on these cores.

I'll have a stab at implementing cbcmac(aesni) early next week: as i
pointed out before, we don't need all the ccm boilerplate if the ctr
and mac processing are still done in separate passes anyway.


>
> perf-top on 5.4 during download test with our aesni patch looks like this:
>
>     11.73%  libc-2.29.so   [.] __memset_sse2_unaligned_erms
>       4.79%  [kernel]       [k] _aesni_enc1
>       1.71%  [kernel]       [k] ___bpf_prog_run
>       1.66%  [kernel]       [k] memcpy
>       1.25%  [kernel]       [k] copy_user_generic_string
>       1.18%  libjvm.so      [.] InstanceKlass::oop_follow_contents
>       1.07%  [kernel]       [k] _aesni_enc4
>       0.98%  [kernel]       [k] csum_partial_copy_generic
>       0.96%  libjvm.so      [.] SpinPause
>       0.84%  [kernel]       [k] get_data_to_compute
>       0.81%  libjvm.so      [.] ParMarkBitMap::mark_obj
>       0.64%  [kernel]       [k] udp_sendmsg
>       0.62%  [kernel]       [k] __ip_append_data.isra.53
>       0.58%  [kernel]       [k] ipt_do_table
>       0.56%  [kernel]       [k] _aesni_inc
>       0.56%  [kernel]       [k] fib_table_lookup
>       0.55%  [kernel]       [k] __rcu_read_unlock
>       0.52%  libc-2.29.so   [.] __GI___strcmp_ssse3
>       0.50%  [kernel]       [k] igb_xmit_frame_ring
>
>
> on 5.7, we see this:
>
>     11.36%  libc-2.29.so   [.] __memset_sse2_unaligned_erms
>       9.03%  [kernel]       [k] kernel_fpu_begin
>       4.75%  libjvm.so      [.] SpinPause
>       2.89%  [kernel]       [k] __crypto_xor
>       2.35%  [kernel]       [k] _aesni_enc1
>       1.94%  [kernel]       [k] copy_user_generic_string
>       1.29%  [kernel]       [k] aesni_encrypt
>       0.85%  [kernel]       [k] udp_sendmsg
>       0.85%  [kernel]       [k] crypto_cipher_encrypt_one
>       0.71%  [kernel]       [k] crypto_cbcmac_digest_update
>       0.69%  [kernel]       [k] __ip_append_data.isra.53
>       0.69%  [kernel]       [k] memcpy
>       0.68%  [kernel]       [k] crypto_ctr_crypt
>       0.61%  [kernel]       [k] irq_fpu_usable
>       0.58%  [kernel]       [k] ipt_do_table
>       0.55%  [kernel]       [k] __dev_queue_xmit
>       0.54%  [kernel]       [k] crypto_inc
>       0.49%  libc-2.29.so   [.] __GI___strcmp_ssse3
>       0.45%  libjvm.so      [.] InstanceKlass::oop_follow_contents
>       0.45%  [kernel]       [k] ip_route_output_key_hash_rcu
>
>
>
> So, I think there is still some good improvement possible, likely with something like
> the aesni patch I showed, but re-worked to function in 5.7+ kernels.
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@xxxxxxxxxxxxxxx>
> Candela Technologies Inc  http://www.candelatech.com



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux