Many of the comments I made on the x86 version apply to this too, but a few unique comments below: On Tue, Apr 12, 2022 at 05:28:15PM +0000, Nathan Huckleberry wrote: > Add hardware accelerated version of POLYVAL for ARM64 CPUs with > Crypto Extension support. As I've mentioned, it seems to be "Crypto Extensions", not "Crypto Extension". That's what everything else in the kernel says, at least. It's just a nit, but it's unclear whether you intentionally decided to not change this, or missed it. > diff --git a/arch/arm64/crypto/polyval-ce-core.S b/arch/arm64/crypto/polyval-ce-core.S [..] > + .text > + .align 4 > + > + .arch armv8-a+crypto > + .align 4 Having '.align' twice is redundant, right? > + > +.Lgstar: > + .quad 0xc200000000000000, 0xc200000000000000 > + > +/* > + * Computes the product of two 128-bit polynomials in X and Y and XORs the > + * components of the 256-bit product into LO, MI, HI. > + * > + * X = [X_1 : X_0] > + * Y = [Y_1 : Y_0] > + * > + * The multiplication produces four parts: > + * LOW: The polynomial given by performing carryless multiplication of X_0 and > + * Y_0 > + * MID: The polynomial given by performing carryless multiplication of (X_0 + > + * X_1) and (Y_0 + Y_1) > + * HIGH: The polynomial given by performing carryless multiplication of X_1 > + * and Y_1 > + * > + * We compute: > + * LO += LOW > + * MI += MID > + * HI += HIGH > + * > + * Later, the 256-bit result can be extracted as: > + * [HI_1 : HI_0 + HI_1 + MI_1 + LO_1 :: LO_1 + HI_0 + MI_0 + LO_0 : LO_0] What does the double colon mean? > + * This step is done when computing the polynomial reduction for efficiency > + * reasons. > + */ > +.macro karatsuba1 X Y A super brief explanation of why Karatsuba multiplication is used instead of schoolbook multiplication would be helpful. > + X .req \X > + Y .req \Y > + ext v25.16b, X.16b, X.16b, #8 > + ext v26.16b, Y.16b, Y.16b, #8 > + eor v25.16b, v25.16b, X.16b > + eor v26.16b, v26.16b, Y.16b > + pmull v27.1q, v25.1d, v26.1d > + pmull2 v28.1q, X.2d, Y.2d > + pmull v29.1q, X.1d, Y.1d > + eor MI.16b, MI.16b, v27.16b > + eor HI.16b, HI.16b, v28.16b > + eor LO.16b, LO.16b, v29.16b > + .unreq X > + .unreq Y > +.endm Maybe move the pmull into v27 and the eor into MI down a bit, since they have more dependencies? I.e. ext v25.16b, X.16b, X.16b, #8 ext v26.16b, Y.16b, Y.16b, #8 eor v25.16b, v25.16b, X.16b eor v26.16b, v26.16b, Y.16b pmull2 v28.1q, X.2d, Y.2d pmull v29.1q, X.1d, Y.1d pmull v27.1q, v25.1d, v26.1d eor HI.16b, HI.16b, v28.16b eor LO.16b, LO.16b, v29.16b eor MI.16b, MI.16b, v27.16b I don't know whether it will actually matter on arm64, but on lower-powered arm32 CPUs this sort of thing definitely can matter. > +.macro karatsuba1_store X Y > + X .req \X > + Y .req \Y > + ext v25.16b, X.16b, X.16b, #8 > + ext v26.16b, Y.16b, Y.16b, #8 > + eor v25.16b, v25.16b, X.16b > + eor v26.16b, v26.16b, Y.16b > + pmull MI.1q, v25.1d, v26.1d > + pmull2 HI.1q, X.2d, Y.2d > + pmull LO.1q, X.1d, Y.1d > + .unreq X > + .unreq Y > +.endm Likewise above. > diff --git a/arch/arm64/crypto/polyval-ce-glue.c b/arch/arm64/crypto/polyval-ce-glue.c [...] > +struct polyval_async_ctx { > + struct cryptd_ahash *cryptd_tfm; > +}; struct polyval_async_ctx is no longer used. > +static int polyval_setkey(struct crypto_shash *tfm, > + const u8 *key, unsigned int keylen) > +{ > + struct polyval_ctx *ctx = crypto_shash_ctx(tfm); > + int i; > + > + if (keylen != POLYVAL_BLOCK_SIZE) > + return -EINVAL; > + > + BUILD_BUG_ON(sizeof(u128) != POLYVAL_BLOCK_SIZE); Where does 'sizeof(u128)' come from? This file doesn't use u128 at all. - Eric