On Wed, May 26, 2021 at 12:07:28PM +0200, Ard Biesheuvel wrote: > AES-CCM (as used in WPA2 CCMP, for instance) typically involves > authenticate-only data, and operates on a single network packet, and so > the common case is for the authenticate, en/decrypt and finalize SIMD > helpers to all be called exactly once in sequence. Since > kernel_neon_end() now involves manipulation of the preemption state as > well as the softirq mask state, let's reduce the number of times we are > forced to call it to only once if we are handling this common case. > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > --- > arch/arm64/crypto/aes-ce-ccm-core.S | 1 + > arch/arm64/crypto/aes-ce-ccm-glue.c | 74 +++++++++++--------- > 2 files changed, 43 insertions(+), 32 deletions(-) > > diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S > index 99a028e298ed..8adff299fcd3 100644 > --- a/arch/arm64/crypto/aes-ce-ccm-core.S > +++ b/arch/arm64/crypto/aes-ce-ccm-core.S > @@ -124,6 +124,7 @@ SYM_FUNC_START(ce_aes_ccm_final) > SYM_FUNC_END(ce_aes_ccm_final) > > .macro aes_ccm_do_crypt,enc > + cbz x2, 5f > ldr x8, [x6, #8] /* load lower ctr */ > ld1 {v0.16b}, [x5] /* load mac */ > CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ > diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c > index 54bd2494a000..98159f2c49ae 100644 > --- a/arch/arm64/crypto/aes-ce-ccm-glue.c > +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c > @@ -97,10 +97,8 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen) > static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], > u32 abytes, u32 *macp) > { > - kernel_neon_begin(); > ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc, > num_rounds(key)); > - kernel_neon_end(); > } [...] > + if (req->assoclen) > + ccm_calculate_auth_mac(req, mac); > + This still makes all the associated data be processed under a single kernel_neon_begin() / kernel_neon_end() pair, even if there is a large amount of it. Shouldn't it be limited to a reasonable amount at a time, like 4K? This sort of thing has been considered a bug before, e.g. see commit 706024a52c6 ("crypto: arch/lib - limit simd usage to 4k chunks"). You could do the entire CCM operation under a single pair as long as there isn't more than 4K of associated data. - Eric