On 08/07/2014 07:05 AM, Christian Lamparter wrote: > The high overhead (math_state_restore and fpu_save_init) are caused by > the way ccm.c interacts with the aesni implementation when calculating > the MAC [1] (in compute_mac). > >> [ ... ] >> /* now encrypt rest of data */ >> while (datalen >= 16) { >> crypto_xor(odata, data, bs); >> crypto_cipher_encrypt_one(tfm, odata, odata); >> >> datalen -= 16; >> data += 16; >> } >> [...] > > crypto_cipher_encrypt_one is a wrapper which in your case calls > aesni's aes_encrypt [2]. > > And aes_encrypt looks like this: > >> [...] >> kernel_fpu_begin(); >> aesni_enc(ctx, dst, src); <-- this is where it goes to _aesni_enc1 >> kernel_fpu_end(); >> [...] > > Or: for every 16 Bytes of payload there is one fpu context save and > restore... ouch! I have never messed with this kind of stuff... Any idea if it would work to put the fpu_begin/end a bit higher and do all those 16 byte chunks in a batch without messing with the FPU for each chunk? Thanks, Ben -- Ben Greear <greearb@xxxxxxxxxxxxxxx> Candela Technologies Inc http://www.candelatech.com -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html