On Sat, Dec 12, 2020 at 09:32:43AM +0100, Ard Biesheuvel wrote: > Commit 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block > size multiples") refactored the chacha block handling in the glue code in > a way that may result in the counter increment to be omitted when calling > chacha_block_xor_neon() to process a full block. This violates the API, > which requires that the output IV is suitable for handling more input as > long as the preceding input has been presented in round multiples of the > block size. It appears that the library API actually requires that the counter be incremented on partial blocks too. See __chacha20poly1305_encrypt(). I guess the missing increment in chacha_doneon() just wasn't noticed before because chacha20poly1305 only needs this behavior on 32-byte inputs, and chacha_doneon() is only executed when the length is over 64 bytes. > > So increment the counter after calling chacha_block_xor_neon(). > > Fixes: 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block size multiples") > Reported-by: Eric Biggers <ebiggers@xxxxxxxxxx> > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > --- > arch/arm/crypto/chacha-glue.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm/crypto/chacha-glue.c b/arch/arm/crypto/chacha-glue.c > index 7b5cf8430c6d..f19e6da8cdd0 100644 > --- a/arch/arm/crypto/chacha-glue.c > +++ b/arch/arm/crypto/chacha-glue.c > @@ -60,6 +60,7 @@ static void chacha_doneon(u32 *state, u8 *dst, const u8 *src, > chacha_block_xor_neon(state, d, s, nrounds); > if (d != dst) > memcpy(dst, buf, bytes); > + state[12] += 1; > } Maybe write this as: state[12]++;