On Sat, Feb 17, 2024 at 05:11:52PM +0100, Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@xxxxxxxxxx> > > The bit-sliced implementation of AES-CTR operates on blocks of 128 > bytes, and will fall back to the plain NEON version for tail blocks or > inputs that are shorter than 128 bytes to begin with. > > It will call straight into the plain NEON asm helper, which performs all > memory accesses in granules of 16 bytes (the size of a NEON register). > For this reason, the associated plain NEON glue code will copy inputs > shorter than 16 bytes into a temporary buffer, given that this is a rare > occurrence and it is not worth the effort to work around this in the asm > code. > > The fallback from the bit-sliced NEON version fails to take this into > account, potentially resulting in out-of-bounds accesses. So clone the > same workaround, and use a temp buffer for short in/outputs. > > Cc: <stable@xxxxxxxxxxxxxxx> > Reported-by: syzbot+f1ceaa1a09ab891e1934@xxxxxxxxxxxxxxxxxxxxxxxxx > Tested-by: syzbot+f1ceaa1a09ab891e1934@xxxxxxxxxxxxxxxxxxxxxxxxx > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> Looks like this could use: Fixes: fc074e130051 ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk") > + if (unlikely(nbytes < AES_BLOCK_SIZE)) > + src = dst = memcpy(buf + sizeof(buf) - nbytes, > + src, nbytes); > + > neon_aes_ctr_encrypt(dst, src, ctx->enc, ctx->key.rounds, > nbytes, walk.iv); > + > + if (unlikely(nbytes < AES_BLOCK_SIZE)) > + memcpy(d, buf + sizeof(buf) - nbytes, nbytes); The second one could use 'dst' instead of 'buf + sizeof(buf) - nbytes', right? Otherwise this looks good. Reviewed-by: Eric Biggers <ebiggers@xxxxxxxxxx> - Eric