On Thu, 3 Nov 2022 at 22:16, Elliott, Robert (Servers) <elliott@xxxxxxx> wrote: > > > > > -----Original Message----- > > From: Ard Biesheuvel <ardb@xxxxxxxxxx> > > Sent: Thursday, November 3, 2022 2:23 PM > > Subject: [PATCH v5 3/3] crypto: aesgcm - Provide minimal library implementation > > > > Given include/crypto/aes.h: > struct crypto_aes_ctx { > u32 key_enc[AES_MAX_KEYLENGTH_U32]; > u32 key_dec[AES_MAX_KEYLENGTH_U32]; > u32 key_length; > }; > > plus: > ... > +struct aesgcm_ctx { > + be128 ghash_key; > + struct crypto_aes_ctx aes_ctx; > + unsigned int authsize; > +}; > ... > > +static void aesgcm_encrypt_block(const struct crypto_aes_ctx *ctx, void *dst, > ... > > + local_irq_save(flags); > > + aes_encrypt(ctx, dst, src); > > + local_irq_restore(flags); > > +} > ... > > +int aesgcm_expandkey(struct aesgcm_ctx *ctx, const u8 *key, > > + unsigned int keysize, unsigned int authsize) > > +{ > > + u8 kin[AES_BLOCK_SIZE] = {}; > > + int ret; > > + > > + ret = crypto_gcm_check_authsize(authsize) ?: > > + aes_expandkey(&ctx->aes_ctx, key, keysize); > > Since GCM uses the underlying cipher's encrypt algorithm for both > encryption and decryption, is there any need for the 240-byte > aesctx->key_dec decryption key schedule that aes_expandkey > also prepares? > No. But this applies to all uses of AES in CTR, XCTR, CMAC, CCM modes, not just to the AES library. > For modes like this, it might be worth creating a specialized > struct that only holds the encryption key schedule (key_enc), > with a derivative of aes_expandkey() that only updates it. > I'm not sure what problem we would be solving here tbh. AES key expansion is unlikely to occur on a hot path, and the 240 byte overhead doesn't seem that big of a deal either. Note that only full table based C implementations of AES have a need for the decryption key schedule, the AES library version could be tweaked to use the encryption key schedule for decryption as well (see below). But the instruction based versions are constructed in a way that also requires the modified schedule for decryption. So I agree that there appears to be /some/ room for improvement here, but I'm not sure it's worth anyone's time tbh. We could explore splitting off the expandkey routine that is exposed to other AES implementations, and use a reduced schedule inside the library itself. Beyond that, I don't see the need to clutter up the API and force all AES code in the tree to choose between an encryption-only or a full key schedule. -------------8<----------------- --- a/lib/crypto/aes.c +++ b/lib/crypto/aes.c @@ -310,3 +310,3 @@ { - const u32 *rkp = ctx->key_dec + 4; + const u32 *rkp = ctx->key_enc + ctx->key_length + 16; int rounds = 6 + ctx->key_length / 4; @@ -315,6 +315,6 @@ - st0[0] = ctx->key_dec[0] ^ get_unaligned_le32(in); - st0[1] = ctx->key_dec[1] ^ get_unaligned_le32(in + 4); - st0[2] = ctx->key_dec[2] ^ get_unaligned_le32(in + 8); - st0[3] = ctx->key_dec[3] ^ get_unaligned_le32(in + 12); + st0[0] = rkp[ 8] ^ get_unaligned_le32(in); + st0[1] = rkp[ 9] ^ get_unaligned_le32(in + 4); + st0[2] = rkp[10] ^ get_unaligned_le32(in + 8); + st0[3] = rkp[11] ^ get_unaligned_le32(in + 12); @@ -331,7 +331,7 @@ - for (round = 0;; round += 2, rkp += 8) { - st1[0] = inv_mix_columns(inv_subshift(st0, 0)) ^ rkp[0]; - st1[1] = inv_mix_columns(inv_subshift(st0, 1)) ^ rkp[1]; - st1[2] = inv_mix_columns(inv_subshift(st0, 2)) ^ rkp[2]; - st1[3] = inv_mix_columns(inv_subshift(st0, 3)) ^ rkp[3]; + for (round = 0;; round += 2, rkp -= 8) { + st1[0] = inv_mix_columns(inv_subshift(st0, 0) ^ rkp[4]); + st1[1] = inv_mix_columns(inv_subshift(st0, 1) ^ rkp[5]); + st1[2] = inv_mix_columns(inv_subshift(st0, 2) ^ rkp[6]); + st1[3] = inv_mix_columns(inv_subshift(st0, 3) ^ rkp[7]); @@ -340,12 +340,12 @@ - st0[0] = inv_mix_columns(inv_subshift(st1, 0)) ^ rkp[4]; - st0[1] = inv_mix_columns(inv_subshift(st1, 1)) ^ rkp[5]; - st0[2] = inv_mix_columns(inv_subshift(st1, 2)) ^ rkp[6]; - st0[3] = inv_mix_columns(inv_subshift(st1, 3)) ^ rkp[7]; + st0[0] = inv_mix_columns(inv_subshift(st1, 0) ^ rkp[0]); + st0[1] = inv_mix_columns(inv_subshift(st1, 1) ^ rkp[1]); + st0[2] = inv_mix_columns(inv_subshift(st1, 2) ^ rkp[2]); + st0[3] = inv_mix_columns(inv_subshift(st1, 3) ^ rkp[3]); } - put_unaligned_le32(inv_subshift(st1, 0) ^ rkp[4], out); - put_unaligned_le32(inv_subshift(st1, 1) ^ rkp[5], out + 4); - put_unaligned_le32(inv_subshift(st1, 2) ^ rkp[6], out + 8); - put_unaligned_le32(inv_subshift(st1, 3) ^ rkp[7], out + 12); + put_unaligned_le32(inv_subshift(st1, 0) ^ rkp[0], out); + put_unaligned_le32(inv_subshift(st1, 1) ^ rkp[1], out + 4); + put_unaligned_le32(inv_subshift(st1, 2) ^ rkp[2], out + 8); + put_unaligned_le32(inv_subshift(st1, 3) ^ rkp[3], out + 12);