On Thu, 10 Dec 2020 at 04:01, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote: > > On 12/9/20 6:43 PM, Herbert Xu wrote: > > On Thu, Dec 10, 2020 at 01:18:12AM +0100, Ard Biesheuvel wrote: > >> > >> One thing I realized just now is that in the current situation, all > >> the synchronous skciphers already degrade like this. > >> > >> I.e., in Ben's case, without the special ccm implementation, ccm(aes) > >> will resolve to ccm(ctr(aesni),cbcmac(aesni)), which is instantiated > >> as a sync skcipher using the ctr and ccm/cbcmac templates built on top > >> of the AES-NI cipher (not skcipher). This cipher will also fall back > >> to suboptimal scalar code if the SIMD is in use in process context. > > > > Sure, your patch is not making it any worse. But I don't think > > the extra code is worth it considering that you're still going to > > be running into that slow fallback path all the time. > > How can we test this assumption? I see 3x performance gain, so it is not hitting > the fallback path much in my case. What traffic pattern and protocol do you think > will cause the slow fallback path to happen often enough to make this patch not > helpful? > Is there a way to verify Herbert's assertion that TX and RX tend to be handled by the same core? I am not a networking guy, but that seems dubious to me. You could add a pr_warn_ratelimited() inside the fallback path and see if it ever gets called at all under various loads. > > Much better to fix the wireless code to actually go async. > > This will not happen any time soon, so better to make incremental > improvement in the crypt code. > I would argue that these are orthogonal. My patch improves both the accelerated and the fallback path, given that the latter does not have to walk the input data twice anymore, and go through 3 layers of templates and the associated indirect calls for each 16 bytes of input. Of course, it would be better to avoid using the fallback path altogether, but I don't think one should hold up the other.