On Wed, Sep 25, 2019 at 9:14 AM Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > > Replace the chacha20poly1305() library calls with invocations of the > RFC7539 AEAD, as implemented by the generic chacha20poly1305 template. Honestly, the other patches look fine to me from what I've seen (with the small note I had in a separate email for 11/18), but this one I consider just nasty, and a prime example of why people hate those crypto lookup routines. Some of it is just the fundamental and pointless silly indirection, that just makes things harder to read, less efficient, and less straightforward. That's exemplified by this part of the patch: > struct noise_symmetric_key { > - u8 key[NOISE_SYMMETRIC_KEY_LEN]; > + struct crypto_aead *tfm; which is just one of those "we know what we want and we just want to use it directly" things, and then the crypto indirection comes along and makes that simple inline allocation of a small constant size (afaik it is CHACHA20POLY1305_KEY_SIZE, which is 32) be another allocation entirely. And it's some random odd non-typed thing too, so then you have that silly and stupid dynamic allocation using a name lookup: crypto_alloc_aead("rfc7539(chacha20,poly1305)", 0, CRYPTO_ALG_ASYNC); to create what used to be (and should be) a simple allocation that was has a static type and was just part of the code. It also ends up doing other bad things, ie that packet-time + if (unlikely(crypto_aead_reqsize(key->tfm) > 0)) { + req = aead_request_alloc(key->tfm, GFP_ATOMIC); + if (!req) + return false; thing that hopefully _is_ unlikely, but that's just more potential breakage from that whole async crypto interface. This is what people do *not* want to do, and why people often don't like the crypto interfaces. It's exactly why we then have those "bare" routines as a library where people just want to access the low-level hashing or whatever directly. So please don't do things like this for an initial import. Leave the thing alone, and just use the direct and synchronous static crypto library interface tjhat you imported in patch 14/18 (but see below about the incomplete import). Now, later on, if you can *show* that some async implementation really really helps, and you have numbers for it, and you can convince people that the above kind of indirection is _worth_ it, then that's a second phase. But don't make code uglier without those actual numbers. Because it's not just uglier and has that silly extra indirection and potential allocation problems, this part just looks very fragile indeed: > The nonce related changes are there to address the mismatch between the > 96-bit nonce (aka IV) that the rfc7539() template expects, and the 64-bit > nonce that WireGuard uses. ... > struct packet_cb { > - u64 nonce; > - struct noise_keypair *keypair; > atomic_t state; > + __le32 ivpad; /* pad 64-bit nonce to 96 bits */ > + __le64 nonce; > + struct noise_keypair *keypair; > u32 mtu; > u8 ds; > }; The above is subtle and silently depends on the struct layout. It really really shouldn't. Can it be acceptable doing something like that? Yeah, but you really should be making it very explicit, perhaps using struct { __le32 ivpad; __le64 nonce; } __packed; or something like that. Because right now you're depending on particular layout of those fields: > + aead_request_set_crypt(req, sg, sg, skb->len, > + (u8 *)&PACKET_CB(skb)->ivpad); but honestly, that's not ok at all. Somebody makes a slight change to that struct, and it might continue to work fine on x86-32 (where 64-bit values are only 32-bit aligned) but subtly break on other architectures. Also, you changed how the nonce works from being in CPU byte order to be explicitly LE. That may be ok, and looks like it might be a cleanup, but honestly I think it should have been done as a separate patch. So could you please update that patch 14/18 to also have that synchronous chacha20poly1305_decrypt_sg() interface, and then just drop this 18/18 for now? That would mean that (a) you wouldn't need this patch, and you can then do that as a separate second phase once you have numbers and it can stand on its own. (b) you'd actually have something that *builds* when you import the main wireguard patch in 15/18 because right now it looks like you're not only forcing this async interface with the unnecessary indirection, you're also basically having a tree that doesn't even build or work for a couple of commits. And I'm still not convinced (a) ever makes sense - the overhead of any accelerator is just high enought that I doubt you'll have numbers - performance _or_ power. But even if you're right that it might be a power advantage on some platform, that wouldn't make it an advantage on other platforms. Maybe it could be done as a config option where you can opt in to the async interface when that makes sense - but not force the indirection and extra allocations when it doesn't. As a separate patch, something like that doesn't sound horrendous (and I think that's also an argument for doing that CPU->LE change as an independent change). Yes, yes, there's also that 17/18 that switches over to a different header file location and Kconfig names but that could easily be folded into 15/18 and then it would all be bisectable. Alternatively, maybe 15/18 could be done with wireguard disabled in the Kconfig (just to make the patch identical), and then 17/18 enables it when it compiles with a big note about how you wanted to keep 15/18 pristine to make the changes obvious. Hmm? I don't really have a dog in this fight, but on the whole I really liked the series. But this 18/18 raised my heckles, and I think I understand why it might raise the heckles of the wireguard people. Please? Linus