Re: [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption

Andy Lutomirski <luto@xxxxxxxxxx> · Thu, 26 Sep 2019 21:36:26 -0700

> On Sep 26, 2019, at 6:38 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> - let the caller know what the state size is and allocate the
> synchronous state in its own data structures
>
> - let the caller just call a static "decrypt_xyz()" function for xyz
> decryptrion.
>
> - if you end up doing it synchronously, that function just returns
> "done". No overhead. No extra allocations. No unnecessary stuff. Just
> do it, using the buffers provided. End of story. Efficient and simple.
>
> - BUT.
>
> - any hardware could have registered itself for "I can do xyz", and
> the decrypt_xyz() function would know about those, and *if* it has a
> list of accelerators (hopefully sorted by preference etc), it would
> try to use them. And if they take the job (they might not - maybe
> their queues are full, maybe they don't have room for new keys at the
> moment, which might be a separate setup from the queues), the
> "decrypt_xyz()" function returns a _cookie_ for that job. It's
> probably a pre-allocated one (the hw accelerator might preallocate a
> fixed number of in-progress data structures).

To really do this right, I think this doesn't go far enough.  Suppose
I'm trying to implement send() over a VPN very efficiently.  I want to
do, roughly, this:

void __user *buf, etc;

if (crypto api thinks async is good) {
  copy buf to some kernel memory;
  set up a scatterlist;
  do it async with this callback;
} else {
  do the crypto synchronously, from *user* memory, straight to kernel memory;
  (or, if that's too complicated, maybe copy in little chunks to a
little stack buffer.
   setting up a scatterlist is a waste of time.)
}

I don't know if the network code is structured in a way to make this
work easily, and the API would be more complex, but it could be nice
and fast.