"Jason A. Donenfeld" <Jason@xxxxxxxxx> writes: > [CC +willy, toke, dave, netdev] > > Hi Pascal > > On Thu, Sep 26, 2019 at 12:19 PM Pascal Van Leeuwen > <pvanleeuwen@xxxxxxxxxxxxxx> wrote: >> Actually, that assumption is factually wrong. I don't know if anything >> is *publicly* available, but I can assure you the silicon is running in >> labs already. And something will be publicly available early next year >> at the latest. Which could nicely coincide with having Wireguard support >> in the kernel (which I would also like to see happen BTW) ... >> >> Not "at some point". It will. Very soon. Maybe not in consumer or server >> CPUs, but definitely in the embedded (networking) space. >> And it *will* be much faster than the embedded CPU next to it, so it will >> be worth using it for something like bulk packet encryption. > > Super! I was wondering if you could speak a bit more about the > interface. My biggest questions surround latency. Will it be > synchronous or asynchronous? If the latter, why? What will its > latencies be? How deep will its buffers be? The reason I ask is that a > lot of crypto acceleration hardware of the past has been fast and > having very deep buffers, but at great expense of latency. In the > networking context, keeping latency low is pretty important. Already > WireGuard is multi-threaded which isn't super great all the time for > latency (improvements are a work in progress). If you're involved with > the design of the hardware, perhaps this is something you can help > ensure winds up working well? For example, AES-NI is straightforward > and good, but Intel can do that because they are the CPU. It sounds > like your silicon will be adjacent. How do you envision this working > in a low latency environment? Being asynchronous doesn't *necessarily* have to hurt latency; you just need the right queue back-pressure. We already have multiple queues in the stack. With an async crypto engine we would go from something like: stack -> [qdisc] -> wg if -> [wireguard buffer] -> netdev driver -> device -> [device buffer] -> wire to stack -> [qdisc] -> wg if -> [wireguard buffer] -> crypto stack -> crypto device -> [crypto device buffer] -> wg post-crypto -> netdev driver -> device -> [device buffer] -> wire (where everything in [] is a packet queue). The wireguard buffer is the source of the latency you're alluding to above (the comment about multi-threaded behaviour), so we probably need to fix that anyway. For the device buffer we have BQL to keep it at a minimum. So that leaves the buffering in the crypto offload device. If we add something like BQL to the crypto offload drivers, we could conceivably avoid having that add a significant amount of latency. In fact, doing so may benefit other users of crypto offloads as well, no? Presumably ipsec has this same issue? Caveat: I am fairly ignorant about the inner workings of the crypto subsystem, so please excuse any inaccuracies in the above; the diagrams are solely for illustrative purposes... :) -Toke