Re: [PATCH 0/5] crypto: Implement cmac based on cbc skcipher

Ard Biesheuvel <ardb@xxxxxxxxxx> · Thu, 20 Aug 2020 09:19:16 +0200

On Thu, 20 Aug 2020 at 09:06, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Aug 20, 2020 at 09:04:26AM +0200, Ard Biesheuvel wrote:
> >
> > I don't disagree with that, especially given all the effort that went
> > into optimizing FPU preserve/restore on both arm64 and x86. But the
> > bottom line is that this is what is causing the degradation in Ben's
> > case, so we cannot disregard it.
>
> If he's having problems with the performance when SIMD is in use
> due to preserve/restore, I'd hate to see his numbers when SIMD is
> not available.
>

Actually, I'm not so sure that they will be so much worse. The
expensive FPU preserve/restore occurs for every 16 bytes of data
processed by the AES cipher, which I'd estimate to take ~10 cycles per
byte for an unaccelerated implementation. But table based AES should
be avoided, especially for MAC algorithms where the plaintext may be
known to an attacker who is after the key.

However, the CCMP handling is invoked from softirq context or from
task context, and so SIMD is generally available unless the softirq
happens to be taken over the back of a hardirq that interrupted a task
running in the kernel that was using the SIMD already. IOW, this
happens so rarely in practice that I would not expect it to be
noticeable in the performance stats.

> IOW if this really matters to him, then wireless code needs to switch
> over to ahash.
>
> Solving half of the problem simply makes no sense.
>

My v2 attempt at cbcmac(aesni) implements an ahash, but a synchronous
one. This means we can amortize the FPU preserve/restore over the
entire scatterlist, instead of relying on the ahash walk to present
the data in virtually mapped chunks.

I'd still like to explore this approach, but I simply haven't had the
spare cycles to spend on this.