On Thu, 20 Aug 2020 at 09:44, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Aug 20, 2020 at 09:33:21AM +0200, Ard Biesheuvel wrote: > > > > > On my machine the performance difference on a 1472-byte request > > > between SIMD and generic is 2161 vs. 7558 (cycles). > > > > Sure. But your machine does not have the pathological FPU > > preserve/restore performance. > > Why does that matter? These are numbers for cbc-aesni which means > just a single preserve/restore for the whole request. > No, that is the whole problem. The CCM template has a CBCMAC implementation that wraps the bare cipher, which means it invokes crypto_cipher_encrypt_one() for each 16 bytes of input, and each of those calls involves a FPU preserve/restore. > Or are you saying on Ben's machine cbc-aesni would have worse > performance vs. aes-generic? > Yes, given the pathological overhead of FPU preserve/restore for every block of 16 bytes processed by the cbcmac wrapper. > > The mac80211 CCMP code uses a synchronous ccm aead, which gets backed > > by a skcipher+ahash combo by the ccm template. So a synchronous ahash > > is fine for this particular case. > > OK I was just grepping for cmac so didn't see this. > > For this case, I think it's even more important that it be converted > over to async because its sending path is also in user context just > like IPsec. > Indeed. cmac() is not really relevant for performance, afaict. Only cbcmac() is used for bulk data. > So simply by sending wireless packets you can hog the CPU while > doing SIMD in kernel context which would then kill the receive > path if you're using the generic fallback. > > Cheers, > -- > Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt