On Wed, May 04, 2022 at 12:18:22AM +0000, Nathan Huckleberry wrote: > + * X = [X_1 : X_0] > + * Y = [Y_1 : Y_0] > + * > + * The multiplication produces four parts: > + * LOW: The polynomial given by performing carryless multiplication of X_0 and > + * Y_0 > + * MID: The polynomial given by performing carryless multiplication of (X_0 + > + * X_1) and (Y_0 + Y_1) > + * HIGH: The polynomial given by performing carryless multiplication of X_1 > + * and Y_1 > + * > + * We compute: > + * LO += LOW > + * MI += MID > + * HI += HIGH Three parts, not four. But why not write this as the much more concise: * Given: * X = [X_1 : X_0] * Y = [Y_1 : Y_0] * * We compute: * LO += X_0 * Y_0 * MI += (X_0 + X_1) * (Y_0 + Y_1) * HI += X_1 * Y_1 > + * So our final computation is: T = T_1 : T_0 = g*(x) * P_0 V = V_1 : V_0 = > + * g*(x) * (P_1 + T_0) p(x) / x^{128} mod g(x) = P_3 + P_1 + T_0 + V_1 : P_2 + > + * P_0 + T_1 + V_0 As on the x86 version, this part is now unreadable. It was fine in v5. > + * [HI_1 : HI_0 + HI_1 + MI_1 + LO_1 : LO_1 + HI_0 + MI_0 + LO_0 : LO_0] [...] > + * [HI_1 : HI_1 + HI_0 + MI_1 + LO_1 : HI_0 + MI_0 + LO_1 + LO_0 : LO_0] [...] > + // TMP_V = T_1 : T_0 = P_0 * g*(x) > + pmull TMP_V.1q, PL.1d, GSTAR.1d [...] > + // TMP_V = V_1 : V_0 = (P_1 + T_0) * g*(x) > + pmull2 TMP_V.1q, GSTAR.2d, TMP_V.2d > + eor DEST.16b, PH.16b, TMP_V.16b [...] > + pmull TMP_V.1q, GSTAR.1d, PL.1d [...] > + pmull2 TMP_V.1q, GSTAR.2d, TMP_V.2d [...] > + eor SUM.16b, TMP_V.16b, PH.16b It looks like you didn't fully address my comments on v5 about putting operands in a consistent order. Not a big deal, but assembly code is always hard to read, and anything to make it easier would be greatly appreciated. > +/* > + * Handle any extra blocks afer full_stride loop. > + */ Typo above. > diff --git a/arch/arm64/crypto/polyval-ce-glue.c b/arch/arm64/crypto/polyval-ce-glue.c [...] > +struct polyval_tfm_ctx { > + u8 key_powers[NUM_KEY_POWERS][POLYVAL_BLOCK_SIZE]; > +}; This is missing the comment about the order of the key powers that I had suggested for readability. It made it into the x86 version but not here. This file is very similar to arch/x86/crypto/polyval-clmulni_glue.c, so if you could diff them and eliminate any unintended differences, that would be helpful. Other than the above readability suggestions this patch looks good, nice job. - Eric