Re: [PATCH net-next] macsec: introduce default_async_crypto sysctl

Scott Dial <scott@xxxxxxxxxxxxx> · Wed, 23 Aug 2023 16:22:31 -0400

2023-08-18, 18:46:48 -0700, Jakub Kicinski wrote:
Can we not fix the ordering problem?
Queue the packets locally if they get out of order?

AES-NI's implementation of gcm(aes) requires the FPU, so if it's busy 
the decrypt gets stuck on the cryptd queue, but that queue is not 
order-preserving. If the macsec driver maintained a queue for the netdev 
that was order-preserving, then you could resolve the issue, but it adds 
more complexity to the macsec driver, so I assume that's why the 
maintainers have always desired to revert my patch instead of ensuring 
packet order.

With respect to AES-NI's implementation of gcm(aes), it's unfortunate 
that there is not a synchronous version that uses the FPU when available 
and fallsback to gcm_base(ctr(aes-aesni),ghash-generic) when it's not. 
In that case, you would get the benefit of the FPU for the majority of 
time when it's available. When I suggested this to linux-crypto, I was 
told that relying on synchronous crypto in the macsec driver was wrong:

On 12 Aug 2020 10:45:00 +0000, Pascal Van Leeuwen wrote:
Forcing the use of sync algorithms only would be detrimental to platforms
that do not have CPU accelerated crypto, but do have HW acceleration
for crypto external to the CPU. I understand it's much easier to implement,
but that is just being lazy IMHO. For bulk crypto of relatively independent
blocks (networking packets, disk sectors), ASYNC should always be preferred.

So, I abandoned my suggestion to add a fallback. The complexity of the 
queueing the macsec driver was beyond the time I had available, and the 
regression in performance was not significant for my use case, but I 
understand that others may have different requirements. I would 
emphasize that benchmarking of network performance should be done by 
looking at more than just the interface frame rate. For instance, 
out-of-order deliver of packets can trigger TCP backoff. I was never 
interested in how many packets the macsec driver could stuff onto the 
wire, because the impact was my TCP socket stalling and my UDP streams 
being garbled.

On 8/22/2023 11:39 AM, Sabrina Dubroca wrote:
Actually, looking into the crypto API side, I don't see how they can
get out of order since commit 81760ea6a95a ("crypto: cryptd - Add
helpers to check whether a tfm is queued"):

     [...] ensure that no reordering is introduced because of requests
     queued in cryptd with respect to requests being processed in
     softirq context.

And cryptd_aead_queued() is used by AESNI (via simd_aead_decrypt()) to
decide whether to process the request synchronously or not.

I have not been following linux-crypto changes, but I would be surprised 
if request is not flagged with CRYPTO_TFM_REQ_MAY_BACKLOG, so it would 
be queue. If that's not the case, then the attempt to decrypt would 
return -EBUSY, which would translate to a packet error, since 
macsec_decrypt MUST handle the skb during the softirq.

So I really don't get what commit ab046a5d4be4 was trying to fix. I've
never been able to reproduce that issue, I guess commit 81760ea6a95a
explains why.
>
> I'd suggest to revert commit ab046a5d4be4, but it feels wrong to
> revert it without really understanding what problem Scott hit and why
> 81760ea6a95a didn't solve it.

I don't think that commit has any relevance to the issue. For instance 
with AES-NI, you need to have competing load on the FPU such that 
crypto_simd_usable() fails to be true. In the past, I replicated this 
failure mode using two SuperMicro 5018D-FN4T servers directly connected 
to each other, which is a Xeon-D 1541 w/ Intel 10GbE NIC (ixgbe driver). 
From there, I would send /dev/urandom as UDP to the other host. I would 
get about 1 out of 10k packets queued on cryptd with that setup. My real 
world case was transporting MPEG TS video streams, each about 1k pps, so 
that is an decode error in the video stream every 10 seconds.

--
Scott Dial
scott@xxxxxxxxxxxxx