On Fri, May 6, 2022 at 12:41 AM Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > On Wed, May 04, 2022 at 12:18:20AM +0000, Nathan Huckleberry wrote: > > Added some clarifying comments, changed the register allocations to make > > the code clearer, and added register aliases. > > > > Signed-off-by: Nathan Huckleberry <nhuck@xxxxxxxxxx> > > I was a bit surprised to see this after the xctr support patch rather than > before. Doing the cleanup first would make adding and reviewing the xctr > support easier. But it's not a big deal; if you already tested it this way you > can just leave it as-is if you want. > > A few minor comments below. > > > + /* > > + * Set up the counter values in v0-v4. > > + * > > + * If we are encrypting less than MAX_STRIDE blocks, the tail block > > + * handling code expects the last keystream block to be in v4. For > > + * example: if encrypting two blocks with MAX_STRIDE=5, then v3 and v4 > > + * should have the next two counter blocks. > > + */ > > The first two mentions of v4 should actually be v{MAX_STRIDE-1}, as it is > actually v4 for MAX_STRIDE==5 and v3 for MAX_STRIDE==4. > > > @@ -355,16 +383,16 @@ AES_FUNC_END(aes_cbc_cts_decrypt) > > mov v3.16b, vctr.16b > > ST5( mov v4.16b, vctr.16b ) > > .if \xctr > > - sub x6, x11, #MAX_STRIDE - 1 > > - sub x7, x11, #MAX_STRIDE - 2 > > - sub x8, x11, #MAX_STRIDE - 3 > > - sub x9, x11, #MAX_STRIDE - 4 > > -ST5( sub x10, x11, #MAX_STRIDE - 5 ) > > - eor x6, x6, x12 > > - eor x7, x7, x12 > > - eor x8, x8, x12 > > - eor x9, x9, x12 > > - eor x10, x10, x12 > > + sub x6, CTR, #MAX_STRIDE - 1 > > + sub x7, CTR, #MAX_STRIDE - 2 > > + sub x8, CTR, #MAX_STRIDE - 3 > > + sub x9, CTR, #MAX_STRIDE - 4 > > +ST5( sub x10, CTR, #MAX_STRIDE - 5 ) > > + eor x6, x6, IV_PART > > + eor x7, x7, IV_PART > > + eor x8, x8, IV_PART > > + eor x9, x9, IV_PART > > + eor x10, x10, IV_PART > > The eor into x10 should be enclosed by ST5(), since it's dead code otherwise. > > > + /* > > + * If there are at least MAX_STRIDE blocks left, XOR the plaintext with > > + * keystream and store. Otherwise jump to tail handling. > > + */ > > Technically this could be XOR-ing with either the plaintext or the ciphertext. > Maybe write "data" instead. > > > .Lctrtail1x\xctr: > > - sub x7, x6, #16 > > - csel x6, x6, x7, eq > > - add x1, x1, x6 > > - add x0, x0, x6 > > - ld1 {v5.16b}, [x1] > > - ld1 {v6.16b}, [x0] > > + /* > > + * Handle <= 16 bytes of plaintext > > + */ > > + sub x8, x7, #16 > > + csel x7, x7, x8, eq > > + add IN, IN, x7 > > + add OUT, OUT, x7 > > + ld1 {v5.16b}, [IN] > > + ld1 {v6.16b}, [OUT] > > ST5( mov v3.16b, v4.16b ) > > encrypt_block v3, w3, x2, x8, w7 > > w3 and x2 should be ROUNDS_W and KEY, respectively. > > This code also has the very unusual property that it reads and writes before the > buffers given. Specifically, for bytes < 16, it access the 16 bytes beginning > at &in[bytes - 16] and &dst[bytes - 16]. Mentioning this explicitly would be > very helpful, particularly in the function comments for aes_ctr_encrypt() and > aes_xctr_encrypt(), and maybe in the C code, so that anyone calling these > functions has this in mind. If bytes < 16, then the C code uses a buffer of 16 bytes to avoid this. I'll add some comments explaining that because its not entirely clear what is happening in the C unless you've taken a deep dive into the asm. > > Anyway, with the above addressed feel free to add: > > Reviewed-by: Eric Biggers <ebiggers@xxxxxxxxxx> > > - Eric