On 08/14/2014 05:39 AM, Christian Lamparter wrote: > On Tuesday, August 12, 2014 11:34:59 AM Ben Greear wrote: >> On 08/10/2014 06:44 AM, Christian Lamparter wrote: >>> On Thursday, August 07, 2014 10:45:01 AM Ben Greear wrote: >>>> On 08/07/2014 07:05 AM, Christian Lamparter wrote: >>>>> Or: for every 16 Bytes of payload there is one fpu context save and >>>>> restore... ouch! >>>> >>>> Any idea if it would work to put the fpu_begin/end a bit higher >>>> and do all those 16 byte chunks in a batch without messing with >>>> the FPU for each chunk? >>> >>> It sort of works - see sample feature patch for aesni-intel-glue >>> (taken from 3.16-wl). Older kernels (like 3.15, 3.14) need: >>> "crypto: allow blkcipher walks over AEAD data" [0] (and maybe more). >>> >>> The FPU save/restore overhead should be gone. Also, if the aesni >>> instructions can't be used, the implementation will fall back >>> to the original ccm(aes) code. Calculating the MAC is still much >>> more expensive than the payload encryption or decryption. However, >>> I can't see a way of making this more efficient without rewriting >>> and combining the parts I took from crypto/ccm.c into an several, >>> dedicated assembler functions. >> >> Without encryption, I see download rate of around 400 - 420Mbps. >> >> So, your patch looks like a good improvement to me, and I'll be >> happy to test further patches if you happen to do those assembler >> optimizations you talk about above. > > Maybe, that will depend on what the results for: "wpa2, *HW*-crypt, > download, udp" are. I'll do that test sometime soon and post the results. >> Let me know if you would like more/different performance >> stats. > > There's a test bench tool (tcrypt) to measure the performance > of any cipher. It would be interesting to know what the > performance/throughput it can produce without the overhead > of any application. [Yep, I'm making a small patch to test that, > but not before Saturday next week]. > >> Here is perf top of open authentication, download, UDP: >> >> Using WPA2, sw-crypt, download, UDP: >> >> Samples: 52K of event 'cycles', Event count (approx.): 13162827574 >> 24.78% btserver [.] 0x00000000000c598c > Is btserver your "udp download" test application? What does it do, as > it is accounting for nearly 25%? btserver is our traffic generator. In this case, it is mostly just receiving UDP frames using non-blocking IO (using recvmmsg, in this case), but it does a fair bit of stats gathering and such. It typically compares well with iperf as far as throughput goes, but I'm sure it uses at least a bit more CPU as compared to iperf. Thanks, Ben -- Ben Greear <greearb@xxxxxxxxxxxxxxx> Candela Technologies Inc http://www.candelatech.com -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html