On 10/16/23 23:43, Ard Biesheuvel wrote:
On Tue, 17 Oct 2023 at 05:16, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
On Mon, Oct 16, 2023 at 01:50:05PM -0700, Ben Greear wrote:
On 11/12/22 06:59, Ard Biesheuvel wrote:
On Fri, 11 Nov 2022 at 23:29, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
On 11/9/22 2:05 AM, Ard Biesheuvel wrote:
On Wed, 9 Nov 2022 at 04:52, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:
On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote:
While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly,
and I found this patch described below is applied now. Does this upstream patch mean that aesni is already
supported upstream now? Or is it specific to whatever xctr is? If so,
any chance the patch is wanted upstream now?
AFAICS the xctr patch has nothing to do with what you were trying
to achieve with wireless. My objection still stands with regards
to wireless, we should patch wireless to use the async crypto
interface and not hack around it in the Crypto API.
Indeed. Those are just add/add conflicts because both patches
introduce new code into the same set of files. The resolution is
generally to keep both sides.
As for Herbert's objection: I will note here that in the meantime,
arm64 now has gotten rid of the scalar fallbacks entirely in AEAD and
skipcher implementations, because those are only callable in task or
softirq context, and the arm64 SIMD wrappers now disable softirq
processing. This means that the condition that results in the fallback
being needed can no longer occur, making the SIMD helper dead code on
arm64.
I suppose we might do the same thing on x86, but since the kernel mode
SIMD handling is highly arch specific, you'd really need to raise this
with the x86 maintainers.
Hello Ard,
Could you please review the attached patch to make sure I merged it properly? My concern
is the cleanup section and/or some problems I might have introduced related to the similarly
named code that was added upstream.
I don't think the logic is quite right, although it rarely matter.
I've pushed my version here - it invokes the static call for CTR so it
will use the faster AVX version if the CPU supports it.
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=aesni-ccm-v6.1
Hello Ard,
It looks like something changed again in the intel-aesni logic for 6.6 kernel. I was able to do a small
change to the patch to get it to compile, but the kernel crashes when I bring up a wlan port in
6.6. When I remove the aesni patch, the station comes up without crashing. The aesni patch worked
fine in 6.5 as far as I can tell.
I'm attaching my slightly modified version of the patch you sent previous. If you have time to
investigate this it would be much appreciated.
Thanks,
Ben
If this patch is useful, shouldn't it be upstreamed?
It was rejected by Herbert on the basis that the wireless stack should
be converted to use the async API instead.
Hello,
I'm still dragging this patch along (see attached). I notice that there was a big re-write of
the aesni logic in 6.11. Anyone know if this patch would need to be (or should be)
modified to work well with the new upstream code?
I managed to get it compiling with modest merge work, but have not done any
actual testing yet...
Thanks,
Ben
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com
commit db4e0e03707492f4747914e080ee9ec47ed044b8
Author: Ard Biesheuvel <ardb@xxxxxxxxxx>
Date: Tue Dec 1 20:45:56 2020 +0100
crypto: aesni - add ccm(aes) algorithm implementation
Add ccm(aes) implementation from linux-wireless mailing list (see
http://permalink.gmane.org/gmane.linux.kernel.wireless.general/126679).
This eliminates FPU context store/restore overhead existing in more
general ccm_base(ctr(aes-aesni),aes-aesni) case in MAC calculation.
Suggested-by: Ben Greear <greearb@xxxxxxxxxxxxxxx>
Co-developed-by: Steve deRosier <derosier@xxxxxxxxxxxxxx>
Signed-off-by: Steve deRosier <derosier@xxxxxxxxxxxxxx>
Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
(cherry picked from commit 53197d3fea9ed0959ffead7b04c5d90e7d3fe126)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index cd37de5ec404..cdd158adf27a 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -81,12 +81,12 @@ asmlinkage void aesni_xts_enc(const struct crypto_aes_ctx *ctx, u8 *out,
asmlinkage void aesni_xts_dec(const struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv);
-#ifdef CONFIG_X86_64
-
asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv);
DEFINE_STATIC_CALL(aesni_ctr_enc_tfm, aesni_ctr_enc);
+#ifdef CONFIG_X86_64
+
asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv,
void *keys, u8 *out, unsigned int num_bytes);
asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv,
@@ -263,6 +263,379 @@ static int cbc_decrypt(struct skcipher_request *req)
return err;
}
+static int aesni_ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct crypto_aes_ctx *ctx = aes_ctx(crypto_aead_ctx(tfm));
+
+ return aes_set_key_common(ctx, in_key, key_len);
+}
+
+static int aesni_ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
+{
+ if ((authsize & 1) || authsize < 4)
+ return -EINVAL;
+ return 0;
+}
+
+static int ccm_set_msg_len(u8 *block, unsigned int msglen, int csize)
+{
+ __be32 data;
+
+ memset(block, 0, csize);
+ block += csize;
+
+ if (csize >= 4)
+ csize = 4;
+ else if (msglen > (1 << (8 * csize)))
+ return -EOVERFLOW;
+
+ data = cpu_to_be32(msglen);
+ memcpy(block - csize, (u8 *)&data + 4 - csize, csize);
+
+ return 0;
+}
+
+static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ __be32 *n = (__be32 *)&maciv[AES_BLOCK_SIZE - 8];
+ u32 l = req->iv[0] + 1;
+
+ /* verify that CCM dimension 'L' is set correctly in the IV */
+ if (l < 2 || l > 8)
+ return -EINVAL;
+
+ /* verify that msglen can in fact be represented in L bytes */
+ if (l < 4 && msglen >> (8 * l))
+ return -EOVERFLOW;
+
+ /*
+ * Even if the CCM spec allows L values of up to 8, the Linux cryptoapi
+ * uses a u32 type to represent msglen so the top 4 bytes are always 0.
+ */
+ n[0] = 0;
+ n[1] = cpu_to_be32(msglen);
+
+ memcpy(maciv, req->iv, AES_BLOCK_SIZE - l);
+
+ /*
+ * Meaning of byte 0 according to CCM spec (RFC 3610/NIST 800-38C)
+ * - bits 0..2 : max # of bytes required to represent msglen, minus 1
+ * (already set by caller)
+ * - bits 3..5 : size of auth tag (1 => 4 bytes, 2 => 6 bytes, etc)
+ * - bit 6 : indicates presence of authenticate-only data
+ */
+ maciv[0] |= (crypto_aead_authsize(aead) - 2) << 2;
+ if (req->assoclen)
+ maciv[0] |= 0x40;
+
+ memset(&req->iv[AES_BLOCK_SIZE - l], 0, l);
+ return ccm_set_msg_len(maciv + AES_BLOCK_SIZE - l, msglen, l);
+}
+
+static int compute_mac(struct crypto_aes_ctx *ctx, u8 mac[], u8 *data, int n,
+ unsigned int ilen, u8 *idata, bool do_simd)
+{
+ unsigned int bs = AES_BLOCK_SIZE;
+ u8 *odata = mac;
+ int datalen, getlen;
+
+ datalen = n;
+
+ /* first time in here, block may be partially filled. */
+ getlen = bs - ilen;
+ if (datalen >= getlen) {
+ memcpy(idata + ilen, data, getlen);
+
+ if (likely(do_simd)) {
+ aesni_cbc_enc(ctx, odata, idata, AES_BLOCK_SIZE, odata);
+ } else {
+ crypto_xor(odata, idata, AES_BLOCK_SIZE);
+ aes_encrypt(ctx, odata, odata);
+ }
+
+ datalen -= getlen;
+ data += getlen;
+ ilen = 0;
+ }
+
+ /* now encrypt rest of data */
+ while (datalen >= bs) {
+ if (likely(do_simd)) {
+ aesni_cbc_enc(ctx, odata, data, AES_BLOCK_SIZE, odata);
+ } else {
+ crypto_xor(odata, data, AES_BLOCK_SIZE);
+ aes_encrypt(ctx, odata, odata);
+ }
+
+ datalen -= bs;
+ data += bs;
+ }
+
+ /* check and see if there's leftover data that wasn't
+ * enough to fill a block.
+ */
+ if (datalen) {
+ memcpy(idata + ilen, data, datalen);
+ ilen += datalen;
+ }
+ return ilen;
+}
+
+static void ccm_calculate_auth_mac(struct aead_request *req,
+ struct crypto_aes_ctx *ctx, u8 mac[],
+ struct scatterlist *src,
+ bool do_simd)
+{
+ unsigned int len = req->assoclen;
+ struct scatter_walk walk;
+ u8 idata[AES_BLOCK_SIZE];
+ unsigned int ilen;
+ struct {
+ __be16 l;
+ __be32 h;
+ } __packed *ltag = (void *)idata;
+
+ /* prepend the AAD with a length tag */
+ if (len < 0xff00) {
+ ltag->l = cpu_to_be16(len);
+ ilen = 2;
+ } else {
+ ltag->l = cpu_to_be16(0xfffe);
+ ltag->h = cpu_to_be32(len);
+ ilen = 6;
+ }
+
+ scatterwalk_start(&walk, src);
+
+ while (len) {
+ u8 *src;
+ int n;
+
+ n = scatterwalk_clamp(&walk, len);
+ if (!n) {
+ scatterwalk_start(&walk, sg_next(walk.sg));
+ n = scatterwalk_clamp(&walk, len);
+ }
+ src = scatterwalk_map(&walk);
+
+ ilen = compute_mac(ctx, mac, src, n, ilen, idata, do_simd);
+ len -= n;
+
+ scatterwalk_unmap(src);
+ scatterwalk_advance(&walk, n);
+ scatterwalk_done(&walk, 0, len);
+ }
+
+ /* any leftover needs padding and then encrypted */
+ if (ilen) {
+ crypto_xor(mac, idata, ilen);
+ if (likely(do_simd))
+ aesni_enc(ctx, mac, mac);
+ else
+ aes_encrypt(ctx, mac, mac);
+ }
+}
+
+static int aesni_ccm_encrypt(struct aead_request *req)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_aes_ctx *ctx = aes_ctx(crypto_aead_ctx(aead));
+ bool const do_simd = crypto_simd_usable();
+ u8 __aligned(8) mac[AES_BLOCK_SIZE];
+ u8 __aligned(8) buf[AES_BLOCK_SIZE];
+ struct skcipher_walk walk;
+ u32 l = req->iv[0] + 1;
+ int err;
+
+ err = ccm_init_mac(req, mac, req->cryptlen);
+ if (err)
+ return err;
+
+ if (likely(do_simd)) {
+ kernel_fpu_begin();
+ aesni_enc(ctx, mac, mac);
+ } else {
+ aes_encrypt(ctx, mac, mac);
+ }
+
+ if (req->assoclen)
+ ccm_calculate_auth_mac(req, ctx, mac, req->src, do_simd);
+
+ req->iv[AES_BLOCK_SIZE - 1] = 0x1;
+ err = skcipher_walk_aead_encrypt(&walk, req, true);
+
+ while (walk.nbytes >= AES_BLOCK_SIZE) {
+ int len = walk.nbytes & AES_BLOCK_MASK;
+ int n;
+
+ for (n = 0; n < len; n += AES_BLOCK_SIZE) {
+ if (likely(do_simd)) {
+ aesni_cbc_enc(ctx, mac, walk.src.virt.addr + n,
+ AES_BLOCK_SIZE, mac);
+ } else {
+ crypto_xor(mac, walk.src.virt.addr + n,
+ AES_BLOCK_SIZE);
+ aes_encrypt(ctx, mac, mac);
+
+ aes_encrypt(ctx, buf, walk.iv);
+ crypto_inc(walk.iv, AES_BLOCK_SIZE);
+ crypto_xor_cpy(walk.dst.virt.addr + n,
+ walk.src.virt.addr + n,
+ buf, AES_BLOCK_SIZE);
+ }
+ }
+ if (likely(do_simd))
+ static_call(aesni_ctr_enc_tfm)(ctx, walk.dst.virt.addr,
+ walk.src.virt.addr, len,
+ walk.iv);
+
+ err = skcipher_walk_done(&walk, walk.nbytes & ~AES_BLOCK_MASK);
+ }
+ if (walk.nbytes) {
+ if (likely(do_simd))
+ aesni_enc(ctx, buf, walk.iv);
+ else
+ aes_encrypt(ctx, buf, walk.iv);
+
+ crypto_xor(mac, walk.src.virt.addr, walk.nbytes);
+ crypto_xor_cpy(walk.dst.virt.addr, walk.src.virt.addr,
+ buf, walk.nbytes);
+
+ if (likely(do_simd))
+ aesni_enc(ctx, mac, mac);
+ else
+ aes_encrypt(ctx, mac, mac);
+
+ err = skcipher_walk_done(&walk, 0);
+ }
+
+ if (err)
+ goto fail;
+
+ memset(walk.iv + AES_BLOCK_SIZE - l, 0, l);
+
+ if (likely(do_simd)) {
+ static_call(aesni_ctr_enc_tfm)(ctx, mac, mac, AES_BLOCK_SIZE, walk.iv);
+ } else {
+ aes_encrypt(ctx, buf, walk.iv);
+ crypto_xor(mac, buf, AES_BLOCK_SIZE);
+ }
+
+ /* copy authtag to end of dst */
+ scatterwalk_map_and_copy(mac, req->dst, req->assoclen + req->cryptlen,
+ crypto_aead_authsize(aead), 1);
+
+fail:
+ if (likely(do_simd))
+ kernel_fpu_end();
+ return err;
+}
+
+static int aesni_ccm_decrypt(struct aead_request *req)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_aes_ctx *ctx = aes_ctx(crypto_aead_ctx(aead));
+ unsigned int authsize = crypto_aead_authsize(aead);
+ bool const do_simd = crypto_simd_usable();
+ u8 __aligned(8) mac[AES_BLOCK_SIZE];
+ u8 __aligned(8) tag[AES_BLOCK_SIZE];
+ u8 __aligned(8) buf[AES_BLOCK_SIZE];
+ struct skcipher_walk walk;
+ u32 l = req->iv[0] + 1;
+ int err;
+
+ err = ccm_init_mac(req, mac, req->cryptlen - authsize);
+ if (err)
+ return err;
+
+ /* copy authtag from end of src */
+ scatterwalk_map_and_copy(tag, req->src,
+ req->assoclen + req->cryptlen - authsize,
+ authsize, 0);
+
+ if (likely(do_simd)) {
+ kernel_fpu_begin();
+ aesni_enc(ctx, mac, mac);
+ } else {
+ aes_encrypt(ctx, mac, mac);
+ }
+
+ if (req->assoclen)
+ ccm_calculate_auth_mac(req, ctx, mac, req->src, do_simd);
+
+ req->iv[AES_BLOCK_SIZE - 1] = 0x1;
+ err = skcipher_walk_aead_decrypt(&walk, req, true);
+
+ while (walk.nbytes >= AES_BLOCK_SIZE) {
+ int len = walk.nbytes & AES_BLOCK_MASK;
+ int n;
+
+ if (likely(do_simd))
+ static_call(aesni_ctr_enc_tfm)(ctx, walk.dst.virt.addr,
+ walk.src.virt.addr, len,
+ walk.iv);
+
+ for (n = 0; n < len; n += AES_BLOCK_SIZE) {
+ if (likely(do_simd)) {
+ aesni_cbc_enc(ctx, mac, walk.dst.virt.addr + n,
+ AES_BLOCK_SIZE, mac);
+ } else {
+ aes_encrypt(ctx, buf, walk.iv);
+ crypto_inc(walk.iv, AES_BLOCK_SIZE);
+ crypto_xor_cpy(walk.dst.virt.addr + n,
+ walk.src.virt.addr + n,
+ buf, AES_BLOCK_SIZE);
+
+ crypto_xor(mac, walk.dst.virt.addr + n,
+ AES_BLOCK_SIZE);
+ aes_encrypt(ctx, mac, mac);
+ }
+ }
+
+ err = skcipher_walk_done(&walk, walk.nbytes & ~AES_BLOCK_MASK);
+ }
+ if (walk.nbytes) {
+ if (likely(do_simd))
+ aesni_enc(ctx, buf, walk.iv);
+ else
+ aes_encrypt(ctx, buf, walk.iv);
+
+ crypto_xor_cpy(walk.dst.virt.addr, walk.src.virt.addr,
+ buf, walk.nbytes);
+ crypto_xor(mac, walk.dst.virt.addr, walk.nbytes);
+
+ if (likely(do_simd))
+ aesni_enc(ctx, mac, mac);
+ else
+ aes_encrypt(ctx, mac, mac);
+
+ err = skcipher_walk_done(&walk, 0);
+ }
+
+ if (err)
+ goto fail;
+
+ memset(walk.iv + AES_BLOCK_SIZE - l, 0, l);
+
+ if (likely(do_simd)) {
+ static_call(aesni_ctr_enc_tfm)(ctx, mac, mac, AES_BLOCK_SIZE, walk.iv);
+ } else {
+ aes_encrypt(ctx, buf, walk.iv);
+ crypto_xor(mac, buf, AES_BLOCK_SIZE);
+ }
+
+ /* compare calculated auth tag with the stored one */
+ if (crypto_memneq(mac, tag, authsize))
+ err = -EBADMSG;
+
+fail:
+ if (likely(do_simd))
+ kernel_fpu_end();
+ return err;
+}
+
static int cts_cbc_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
@@ -1331,6 +1704,22 @@ static void gcm_process_assoc(const struct aes_gcm_key *key, u8 ghash_acc[16],
aes_gcm_aad_update(key, ghash_acc, buf, pos, flags);
}
+static struct aead_alg aesni_ccm_aead = {
+ .base.cra_name = "ccm(aes)",
+ .base.cra_driver_name = "ccm-aesni",
+ .base.cra_priority = 400,
+ .base.cra_blocksize = 1,
+ .base.cra_ctxsize = CRYPTO_AES_CTX_SIZE,
+ .base.cra_module = THIS_MODULE,
+
+ .setkey = aesni_ccm_setkey,
+ .setauthsize = aesni_ccm_setauthsize,
+ .encrypt = aesni_ccm_encrypt,
+ .decrypt = aesni_ccm_decrypt,
+ .ivsize = AES_BLOCK_SIZE,
+ .chunksize = AES_BLOCK_SIZE,
+ .maxauthsize = AES_BLOCK_SIZE,
+};
/* __always_inline to optimize out the branches based on @flags */
static __always_inline int
@@ -1715,6 +2104,12 @@ static int __init aesni_init(void)
if (err)
goto unregister_avx;
+ if (IS_ENABLED(CONFIG_X86_64)) {
+ err = crypto_register_aead(&aesni_ccm_aead);
+ if (err)
+ goto unregister_avx;
+ }
+
return 0;
unregister_avx:
@@ -1737,6 +2132,9 @@ static int __init aesni_init(void)
static void __exit aesni_exit(void)
{
+ if (IS_ENABLED(CONFIG_X86_64))
+ crypto_unregister_aead(&aesni_ccm_aead);
+
simd_unregister_aeads(aes_gcm_algs_aesni,
ARRAY_SIZE(aes_gcm_algs_aesni),
aes_gcm_simdalgs_aesni);