Search Linux Wireless

Re: Looking for non-NIC hardware-offload for wpa2 decrypt.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday, July 28, 2014 01:50:22 PM Ben Greear wrote:
> On 03/31/2014 11:09 AM, Christian Lamparter wrote:
> > Hello,
> > 
> > On Sunday, March 30, 2014 09:40:24 PM Ben Greear wrote:
> >> Due to hardware/firmware limitations, it does not appear possible to
> >> have a wifi NIC do hardware decrypt when using multiple stations on a single
> >> NIC (and have both stations connected to the same AP).
> >>
> >> This just happens to be one of my favourite things to do, and it kills
> >> performance compared to normal 'Open' throughput.
> >>
> >> I am curious if anyone knows of any way to accelerate rx-decrypt, perhaps by
> >> using a specialized hardware board or maybe a feature of certain CPUs?
> > 
> > You could check if your CPU (bios and kernel) have support for AES-NI [0].
> > AFAICT mac80211 utilizes the cryptoapi. Therefore anything that supports
> > the proper crypto bindings can be used to accelerate the encryption and
> > decryption process to some degree. And it just happens that thanks to
> > AES-NI parts of math can be efficiently calculated by the CPU. 
> 
> I recently took a look at this again, and the Intel E5 I'm using
> does use the aesni instructions/driver as far as I can tell.
Which E5 exactly? There are many different E5. 

> Throughput is still around 500Mbps where open is around 800Mbps.
I can't test ath10k or your multiple station on a single NIC thing. But
can you run a test for a "simple" single station - single AP wpa2 setup?
I want to know how close to the 800Mbps it actually goes.

> perf top shows this:
> 
> Samples: 37K of event 'cycles', Event count (approx.): 19360716192
>  12.01%  [kernel]                                      [k] math_state_restore
>  11.64%  [kernel]                                      [k] _aesni_enc1
>   8.25%  [kernel]                                      [k] __save_init_fpu
>   2.44%  [kernel]                                      [k] crypto_xor
>   1.87%  [kernel]                                      [k] irq_fpu_usable
>   1.30%  [kernel]                                      [k] aes_encrypt
>   0.76%  [kernel]                                      [k] __kernel_fpu_end
> ....
Yes, aesni is doing some of the heavy lifting! But in your original post,
you said you are interested in accelerate rx-decrypt... Now it's about 
encryption offload?! [please make up your mind :-D]

That being said 12.01% (math_state_restore -  
called by kernel_fpu_end) and 8.25% (__save_init_fpu - called 
by kernel_fpu_begin) cycles are wasted due fpu save and 
restore overhead. [You have noticed that before, didn't you ;-) ]

I think part of the poor performance is due to the design of
aes_encrypt in arch/x86/crypto/aesni-intel_glue.c:

> static void aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
> {
>        struct crypto_aes_ctx *ctx = aes_ctx(crypto_tfm_ctx(tfm));
>        [...]
>                kernel_fpu_begin();
>                aesni_enc(ctx, dst, src);
>                kernel_fpu_end();
>        [...]
> }

Ideally you would want something like:

>                kernel_fpu_begin();
>                aesni_enc(ctx, dst_frame1, src_frame1);
>                aesni_enc(ctx, dst_frame2, src_frame2);
>                ...
>                aesni_enc(ctx, dst_frameN, src_frameN);
>                kernel_fpu_end();

But getting there might not be easy and involve more than a bit
of "real programming".

In theory, it should be enough to test if there is some potential
in this approach by "enhancing" the tx-path in the following way:

1. the fpu_begin and fpu_end calls should be added to
ieee80211_crypto_ccmp_encrypt in net/mac80211/wpa.c.

>+     kernel_fpu_begin();
>        skb_queue_walk(&tx->skbs, skb) {
>                if (ccmp_encrypt_skb(tx, skb) < 0)
>                        return TX_DROP;
>        }
>+      kernel_fpu_end();
>
>       return TX_CONTINUE;

2. ieee80211_aes_ccm_encrypt in net/mac80211/aes_ccm.c
has to call __aes_encrypt instead of aes_encrypt in crypto_aead_encrypt.
[I can't think of a sane way to make this work. Of course, it's possible to
make a copy of ccm(aes) crypto_alg* and overwrite aes_encrypt with
__aes_encrypt. But that's not very nice... (It should work though) ]

> Any other magic add-in cards that would somehow just make this all faster w/out
> having to do any real programming work? :)
I doubt there is an magic add-in card for such a use-case. I think most of
them target directly applications/libraries and not the crypto-kernel
interface mac80211 is using.

[It would be really nice to know what E5 you actually have]

Regards
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux