On Thu, 11 Jan 2024 at 13:33, Ard Biesheuvel <ardb+git@xxxxxxxxxx> wrote: > > From: Ard Biesheuvel <ardb@xxxxxxxxxx> > > Implement the CCM tail handling using a single sequence that uses > permute vectors and overlapping loads and stores, rather than going over > the tail byte by byte in a loop, and using scalar operations. This is > more efficient, even though the measured speedup is only around 1-2% on > the CPUs I have tried. > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > --- > arch/arm64/crypto/aes-ce-ccm-core.S | 59 +++++++++++++------- > arch/arm64/crypto/aes-ce-ccm-glue.c | 20 +++---- > 2 files changed, 48 insertions(+), 31 deletions(-) > ... The hunks below don't belong here: they were supposed to be squashed into the previous patch. I will fix that up for the next revision. > diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c > index 2f4e6a318fcd..4710e59075f5 100644 > --- a/arch/arm64/crypto/aes-ce-ccm-glue.c > +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c > @@ -181,16 +181,16 @@ static int ccm_encrypt(struct aead_request *req) > if (walk.nbytes == walk.total) > tail = 0; > > - if (unlikely(walk.total < AES_BLOCK_SIZE)) > - src = dst = memcpy(buf + sizeof(buf) - walk.total, > - src, walk.total); > + if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) > + src = dst = memcpy(&buf[sizeof(buf) - walk.nbytes], > + src, walk.nbytes); > > ce_aes_ccm_encrypt(dst, src, walk.nbytes - tail, > ctx->key_enc, num_rounds(ctx), > mac, walk.iv); > > - if (unlikely(walk.total < AES_BLOCK_SIZE)) > - memcpy(walk.dst.virt.addr, dst, walk.total); > + if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) > + memcpy(walk.dst.virt.addr, dst, walk.nbytes); > > if (walk.nbytes == walk.total) > ce_aes_ccm_final(mac, orig_iv, ctx->key_enc, num_rounds(ctx)); > @@ -248,16 +248,16 @@ static int ccm_decrypt(struct aead_request *req) > if (walk.nbytes == walk.total) > tail = 0; > > - if (unlikely(walk.total < AES_BLOCK_SIZE)) > - src = dst = memcpy(buf + sizeof(buf) - walk.total, > - src, walk.total); > + if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) > + src = dst = memcpy(&buf[sizeof(buf) - walk.nbytes], > + src, walk.nbytes); > > ce_aes_ccm_decrypt(dst, src, walk.nbytes - tail, > ctx->key_enc, num_rounds(ctx), > mac, walk.iv); > > - if (unlikely(walk.total < AES_BLOCK_SIZE)) > - memcpy(walk.dst.virt.addr, dst, walk.total); > + if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) > + memcpy(walk.dst.virt.addr, dst, walk.nbytes); > > if (walk.nbytes == walk.total) > ce_aes_ccm_final(mac, orig_iv, ctx->key_enc, num_rounds(ctx)); > -- > 2.43.0.275.g3460e3d667-goog >