On Thu, 20 Aug 2020 at 09:06, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Aug 20, 2020 at 09:04:26AM +0200, Ard Biesheuvel wrote: > > > > I don't disagree with that, especially given all the effort that went > > into optimizing FPU preserve/restore on both arm64 and x86. But the > > bottom line is that this is what is causing the degradation in Ben's > > case, so we cannot disregard it. > > If he's having problems with the performance when SIMD is in use > due to preserve/restore, I'd hate to see his numbers when SIMD is > not available. > Actually, I'm not so sure that they will be so much worse. The expensive FPU preserve/restore occurs for every 16 bytes of data processed by the AES cipher, which I'd estimate to take ~10 cycles per byte for an unaccelerated implementation. But table based AES should be avoided, especially for MAC algorithms where the plaintext may be known to an attacker who is after the key. However, the CCMP handling is invoked from softirq context or from task context, and so SIMD is generally available unless the softirq happens to be taken over the back of a hardirq that interrupted a task running in the kernel that was using the SIMD already. IOW, this happens so rarely in practice that I would not expect it to be noticeable in the performance stats. > IOW if this really matters to him, then wireless code needs to switch > over to ahash. > > Solving half of the problem simply makes no sense. > My v2 attempt at cbcmac(aesni) implements an ahash, but a synchronous one. This means we can amortize the FPU preserve/restore over the entire scatterlist, instead of relying on the ahash walk to present the data in virtually mapped chunks. I'd still like to explore this approach, but I simply haven't had the spare cycles to spend on this.