On Thu, Aug 20, 2020 at 09:19:16AM +0200, Ard Biesheuvel wrote: > > Actually, I'm not so sure that they will be so much worse. The > expensive FPU preserve/restore occurs for every 16 bytes of data > processed by the AES cipher, which I'd estimate to take ~10 cycles per > byte for an unaccelerated implementation. But table based AES should > be avoided, especially for MAC algorithms where the plaintext may be > known to an attacker who is after the key. On my machine the performance difference on a 1472-byte request between SIMD and generic is 2161 vs. 7558 (cycles). > > However, the CCMP handling is invoked from softirq context or from > task context, and so SIMD is generally available unless the softirq > happens to be taken over the back of a hardirq that interrupted a task > running in the kernel that was using the SIMD already. IOW, this > happens so rarely in practice that I would not expect it to be > noticeable in the performance stats. What if the same machine was doing TLS/IPsec sends at full throttle? That would be exactly the wrong time to slow down softirqs four-fold, no? > My v2 attempt at cbcmac(aesni) implements an ahash, but a synchronous > one. This means we can amortize the FPU preserve/restore over the > entire scatterlist, instead of relying on the ahash walk to present > the data in virtually mapped chunks. > > I'd still like to explore this approach, but I simply haven't had the > spare cycles to spend on this. I don't have an issue your patch per se. But please make it so that it has the async path like everything else. Also wireless uses shash so it can't use an ahash anyway even if it is sync. Cheers, -- Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt