On Wed, 26 May 2021 at 19:14, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > On Wed, May 26, 2021 at 12:07:28PM +0200, Ard Biesheuvel wrote: > > AES-CCM (as used in WPA2 CCMP, for instance) typically involves > > authenticate-only data, and operates on a single network packet, and so > > the common case is for the authenticate, en/decrypt and finalize SIMD > > helpers to all be called exactly once in sequence. Since > > kernel_neon_end() now involves manipulation of the preemption state as > > well as the softirq mask state, let's reduce the number of times we are > > forced to call it to only once if we are handling this common case. > > > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > > --- > > arch/arm64/crypto/aes-ce-ccm-core.S | 1 + > > arch/arm64/crypto/aes-ce-ccm-glue.c | 74 +++++++++++--------- > > 2 files changed, 43 insertions(+), 32 deletions(-) > > > > diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S > > index 99a028e298ed..8adff299fcd3 100644 > > --- a/arch/arm64/crypto/aes-ce-ccm-core.S > > +++ b/arch/arm64/crypto/aes-ce-ccm-core.S > > @@ -124,6 +124,7 @@ SYM_FUNC_START(ce_aes_ccm_final) > > SYM_FUNC_END(ce_aes_ccm_final) > > > > .macro aes_ccm_do_crypt,enc > > + cbz x2, 5f > > ldr x8, [x6, #8] /* load lower ctr */ > > ld1 {v0.16b}, [x5] /* load mac */ > > CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ > > diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c > > index 54bd2494a000..98159f2c49ae 100644 > > --- a/arch/arm64/crypto/aes-ce-ccm-glue.c > > +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c > > @@ -97,10 +97,8 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen) > > static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], > > u32 abytes, u32 *macp) > > { > > - kernel_neon_begin(); > > ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc, > > num_rounds(key)); > > - kernel_neon_end(); > > } > [...] > > + if (req->assoclen) > > + ccm_calculate_auth_mac(req, mac); > > + > > This still makes all the associated data be processed under a single > kernel_neon_begin() / kernel_neon_end() pair, even if there is a large amount of > it. Shouldn't it be limited to a reasonable amount at a time, like 4K? > This sort of thing has been considered a bug before, e.g. see > commit 706024a52c6 ("crypto: arch/lib - limit simd usage to 4k chunks"). > > You could do the entire CCM operation under a single pair as long as there isn't > more than 4K of associated data. > Good point. I'll add a separate patch for that.