On Sat, 12 Dec 2020 at 07:43, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > Hi Ard, > > On Tue, Nov 03, 2020 at 05:28:09PM +0100, Ard Biesheuvel wrote: > > @@ -42,24 +42,24 @@ static void chacha_doneon(u32 *state, u8 *dst, const u8 *src, > > { > > u8 buf[CHACHA_BLOCK_SIZE]; > > > > - while (bytes >= CHACHA_BLOCK_SIZE * 4) { > > - chacha_4block_xor_neon(state, dst, src, nrounds); > > - bytes -= CHACHA_BLOCK_SIZE * 4; > > - src += CHACHA_BLOCK_SIZE * 4; > > - dst += CHACHA_BLOCK_SIZE * 4; > > - state[12] += 4; > > - } > > - while (bytes >= CHACHA_BLOCK_SIZE) { > > - chacha_block_xor_neon(state, dst, src, nrounds); > > - bytes -= CHACHA_BLOCK_SIZE; > > - src += CHACHA_BLOCK_SIZE; > > - dst += CHACHA_BLOCK_SIZE; > > - state[12]++; > > + while (bytes > CHACHA_BLOCK_SIZE) { > > + unsigned int l = min(bytes, CHACHA_BLOCK_SIZE * 4U); > > + > > + chacha_4block_xor_neon(state, dst, src, nrounds, l); > > + bytes -= l; > > + src += l; > > + dst += l; > > + state[12] += DIV_ROUND_UP(l, CHACHA_BLOCK_SIZE); > > } > > if (bytes) { > > - memcpy(buf, src, bytes); > > - chacha_block_xor_neon(state, buf, buf, nrounds); > > - memcpy(dst, buf, bytes); > > + const u8 *s = src; > > + u8 *d = dst; > > + > > + if (bytes != CHACHA_BLOCK_SIZE) > > + s = d = memcpy(buf, src, bytes); > > + chacha_block_xor_neon(state, d, s, nrounds); > > + if (d != dst) > > + memcpy(dst, buf, bytes); > > } > > } > > > > Shouldn't this be incrementing the block counter after chacha_block_xor_neon()? > It might be needed by the library API. > Yeah, good point. 'bytes' could be exactly CHACHA_BLOCK_SIZE now, which wasn't the case before. I'll send a fix. > Also, even with that fixed, this patch is causing the self-tests (both the > chacha20poly1305_selftest(), and the crypto API tests for chacha20-neon, > xchacha20-neon, and xchacha12-neon) to fail when I boot a kernel in QEMU. This > doesn't happen on real hardware (Raspberry Pi 2), and I don't see any other bugs > in this patch, so I'm not sure what the problem is. Did you run the self-tests > on every platform you tested this on? > Does your QEMU lack this patch? I found that bug working on this code. https://git.qemu.org/?p=qemu.git;a=commitdiff;h=604cef3e57eaeeef77074d78f6cf2eca1be11c62