Re: [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 25 Sep 2019 15:15:34 -0700

On Wed, Sep 25, 2019 at 9:14 AM Ard Biesheuvel
<ard.biesheuvel@xxxxxxxxxx> wrote:
>
> Replace the chacha20poly1305() library calls with invocations of the
> RFC7539 AEAD, as implemented by the generic chacha20poly1305 template.

Honestly, the other patches look fine to me from what I've seen (with
the small note I had in a separate email for 11/18), but this one I
consider just nasty, and a prime example of why people hate those
crypto lookup routines.

Some of it is just the fundamental and pointless silly indirection,
that just makes things harder to read, less efficient, and less
straightforward.

That's exemplified by this part of the patch:

>  struct noise_symmetric_key {
> -       u8 key[NOISE_SYMMETRIC_KEY_LEN];
> +       struct crypto_aead *tfm;

which is just one of those "we know what we want and we just want to
use it directly" things, and then the crypto indirection comes along
and makes that simple inline allocation of a small constant size
(afaik it is CHACHA20POLY1305_KEY_SIZE, which is 32) be another
allocation entirely.

And it's some random odd non-typed thing too, so then you have that
silly and stupid dynamic allocation using a name lookup:

   crypto_alloc_aead("rfc7539(chacha20,poly1305)", 0, CRYPTO_ALG_ASYNC);

to create what used to be (and should be) a simple allocation that was
has a static type and was just part of the code.

It also ends up doing other bad things, ie that packet-time

+       if (unlikely(crypto_aead_reqsize(key->tfm) > 0)) {
+               req = aead_request_alloc(key->tfm, GFP_ATOMIC);
+               if (!req)
+                       return false;

thing that hopefully _is_ unlikely, but that's just more potential
breakage from that whole async crypto interface.

This is what people do *not* want to do, and why people often don't
like the crypto interfaces.

It's exactly why we then have those "bare" routines as a library where
people just want to access the low-level hashing or whatever directly.

So please don't do things like this for an initial import.

Leave the thing alone, and just use the direct and synchronous static
crypto library interface tjhat you imported in patch 14/18 (but see
below about the incomplete import).

Now, later on, if you can *show* that some async implementation really
really helps, and you have numbers for it, and you can convince people
that the above kind of indirection is _worth_ it, then that's a second
phase. But don't make code uglier without those actual numbers.

Because it's not just uglier and has that silly extra indirection and
potential allocation problems, this part just looks very fragile
indeed:

> The nonce related changes are there to address the mismatch between the
> 96-bit nonce (aka IV) that the rfc7539() template expects, and the 64-bit
> nonce that WireGuard uses.
...
>  struct packet_cb {
> -       u64 nonce;
> -       struct noise_keypair *keypair;
>         atomic_t state;
> +       __le32 ivpad;                   /* pad 64-bit nonce to 96 bits */
> +       __le64 nonce;
> +       struct noise_keypair *keypair;
>         u32 mtu;
>         u8 ds;
>  };

The above is subtle and silently depends on the struct layout.

It really really shouldn't.

Can it be acceptable doing something like that? Yeah, but you really
should be making it very explicit, perhaps using

  struct {
        __le32 ivpad;
        __le64 nonce;
   } __packed;

or something like that.

Because right now you're depending on particular layout of those fields:

> +       aead_request_set_crypt(req, sg, sg, skb->len,
> +                              (u8 *)&PACKET_CB(skb)->ivpad);

but honestly, that's not ok at all.

Somebody makes a slight change to that struct, and it might continue
to work fine on x86-32 (where 64-bit values are only 32-bit aligned)
but subtly break on other architectures.

Also, you changed how the nonce works from being in CPU byte order to
be explicitly LE. That may be ok, and looks like it might be a
cleanup, but honestly I think it should have been done as a separate
patch.

So could you please update that patch 14/18 to also have that
synchronous chacha20poly1305_decrypt_sg() interface, and then just
drop this 18/18 for now?

That would mean that

 (a) you wouldn't need this patch, and you can then do that as a
separate second phase once you have numbers and it can stand on its
own.

 (b) you'd actually have something that *builds* when  you import the
main wireguard patch in 15/18

because right now it looks like you're not only forcing this async
interface with the unnecessary indirection, you're also basically
having a tree that doesn't even build or work for a couple of commits.

And I'm still not convinced (a) ever makes sense - the overhead of any
accelerator is just high enought that I doubt you'll have numbers -
performance _or_ power.

But even if you're right that it might be a power advantage on some
platform, that wouldn't make it an advantage on other platforms. Maybe
it could be done as a config option where you can opt in to the async
interface when that makes sense - but not force the indirection and
extra allocations when it doesn't. As a separate patch, something like
that doesn't sound horrendous (and I think that's also an argument for
doing that CPU->LE change as an independent change).

Yes, yes, there's also that 17/18 that switches over to a different
header file location and Kconfig names but that could easily be folded
into 15/18 and then it would all be bisectable.

Alternatively, maybe 15/18 could be done with wireguard disabled in
the Kconfig (just to make the patch identical), and then 17/18 enables
it when it compiles with a big note about how you wanted to keep 15/18
pristine to make the changes obvious.

Hmm?

I don't really have a dog in this fight, but on the whole I really
liked the series. But this 18/18 raised my heckles, and I think I
understand why it might raise the heckles of the wireguard people.

Please?

     Linus