On 17 September 2018 at 06:09, Andy Lutomirski <luto@xxxxxxxxxx> wrote: > On Tue, Sep 11, 2018 at 4:57 PM David Miller <davem@xxxxxxxxxxxxx> wrote: >> >> From: Andrew Lunn <andrew@xxxxxxx> >> Date: Wed, 12 Sep 2018 01:30:15 +0200 >> >> > Just as an FYI: >> > >> > 1) I don't think anybody in netdev has taken a serious look at the >> > network code yet. There is little point until the controversial part >> > of the code, Zinc, has been sorted out. >> > >> > 2) I personally would be surprised if DaveM took this code without >> > having an Acked-by from the crypto subsystem people. In the same way, >> > i doubt the crypto people would take an Ethernet driver without having >> > DaveM's Acked-by. >> >> Both of Andrew's statements are completely true. >> >> I'm not looking at any of the networking bits until the crypto stuff >> is fully sorted and fully supported and Ack'd by crypto folks. > > So, as a process question, whom exactly are we waiting for: > > CRYPTO API > M: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> > M: "David S. Miller" <davem@xxxxxxxxxxxxx> > L: linux-crypto@xxxxxxxxxxxxxxx > > Herbert hasn't replied to any of these submissions. You're the other > maintainer :) > > To the extent that you (DaveM) want my ack, here's what I think of the > series so far: > > The new APIs to the crypto primitives are good. For code that wants > to do a specific known crypto operation, they are much, much more > pleasant to use than the existing crypto API. The code cleanups in > the big keys patch speak for themselves IMO. I have no problem with > the addition of a brand-new API to the kernel, especially when it's a > nice one like Zinc's, even if that API starts out with only a couple > of users. > > Zinc's arrangement of arch code is just fine. Sure, there are > arguments that putting arch-specific code in arch/ is better, but this > is mostly bikeshedding IMO. > > There has been some discussion of the exact best way to handle > simd_relax() and some other minor nitpicks of API details. This kind > of stuff doesn't need to block the series -- it can always be reworked > down the road if needed. > > There are two things I don't like right now, though: > > 1. Zinc conflates the addition of a new API with the replacement of > some algorithm implementations. This is problematic. Look at the > recent benchmarks of ipsec before and after this series. Apparently > big packets get faster and small packets get slower. It would be > really nice to bisect the series to narrow down *where* the regression > came from, but, as currently structured, you can't. > > The right way to do this is to rearrange the series. First, the new > Zinc APIs should be added, and they should be backed with the > *existing* crypto code. (If the code needs to be moved or copied to a > new location, so be it. The patch will be messy because somehow the > Zinc API is going to have to dispatch to the arch-specific code, and > the way that the crypto API handles it is not exactly friendly to this > type of use. So be it.) Then another patch should switch the crypto > API to use the Zinc interface. That patch, *by itself*, can be > benchmarked. If it causes a regression for small ipsec packets, then > it can be tracked down relatively easily. Once this is all done, the > actual crypto implementation can be changed, and that changed can be > reviewed on its own merits. > > As a simplistic example, I wrote this code a while back: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=crypto/sha256_bpf&id=e9e12f056f2abed50a30b762db9185799f5864e6 > > and its two parents. This series added a more Zinc-like API to > SHA256. And it did it without replacing the SHA256 implementation. > Doing this for Zinc would be a bit more complication, since the arch > code would need to be invoked too, but it should be doable. > > FWIW, Wireguard should not actually depend on the replacement of the > crypto implementation. > > 2. The new Zinc crypto implementations look like they're brand new. I > realize that they have some history, some of them are derived from > OpenSSL, etc, but none of this is really apparent in the patches > themselves. It would be great if the kernel could literally share the > same code as something like OpenSSL, since OpenSSL gets much more > attention than the kernel's crypto. Failing that, it would be nice if > the patches made it more clear how the code differs from its origin. > At the very least, though, if the replacement of the crypto code were, > as above, a patch that just replaced the crypto code, it would be much > easier to review and benchmark intelligently. > OK, so let me summarize my remaining concerns as well. I may be a bit more finicky than Andy, though. First of all, the rate at which new revisions of this series are appearing increases the review effort unnecessarily, especially given that the latest version seemed to have some issues that would have been spotted by a simple build test. I would like to urge Jason to bear with us and bring this discussion to a close before resubmitting. As far as I can tell (i.e., as a user not a network dev), WireGuard is an excellent piece of code, and I would like to see it merged. I also think there is little disagreement about the quality of the proposed algorithm implementations and the usefulness of having a set of easy to use solid crypto primitives in addition to or complementing the current crypto API. I do have some concerns over how the code is organized though: * simd_relax() is currently not called by the crypto routines themselves. This means that the worst case scheduling latency is unbounded, which is unacceptable for the -rt kernel. The worst case scheduling latency should never be proportional to the input size. (Apologies for not spotting that earlier) * Using a cute name for the crypto library [that will end up being the preferred choice for casual use only] may confuse people, given that we have lots of code in crypto/ already. I'd much prefer using, e.g., crypto/lib and crypto/api (regardless of where the arch specific pieces live) * I'd prefer the arch specific pieces to live under arch, but I can live with keeping them in a single place, as long as the arch maintainers have some kind of jurisdiction over them. I also think there should be some overlap between the maintainership responsibilities of the two generic pieces (api and lib). * (Nit) The GCC command line -include'd .h files contain variable and function definitions so they are actually .c files. * (Nit) Referencing CONFIG_xxx macros from -include'd files adds the implicit assumption that the -include appears after the -include of kconfig.h. * Adding /conditional/ -include's (or #include's) increases the size of the validation space, which is why we generally prefer unconditional includes (and static inline stubs), and 'if (IS_ENABLED(CONFIG_xxx))' over #ifdef CONFIG_xxx * The current organization of the code puts all available (for the arch) versions of all routines into a single module, which can only be built in once we update random.c to use Zinc's chacha20 routines. This bloats the core kernel (which is a huge deal for embedded systems that have very strict boot requirements). It also makes it impossible to simply blacklist a module if you, for instance, prefer to use the [potentially more power efficient] scalar code over the SIMD code when using a distro kernel. [To summarize the 4 points above, I'd much rather see a more conventional organization where different parts are provided by different modules. I don't think the argument that inlining is needed for performance is actually valid, given that we have branch predictors and static keys, and the arch SIMD code is built as separate object files anyway] * If we reuse source files from OpenSSL, we should use that actual source which is the perlasm not the emitted assembler. Also, we should work with Andy Polyakov (as I have done several times over the past 5+ years) to upstream the changes we apply to the kernel version of the code. The same applies to code from other sources, btw, but I am not personally familiar with them. * If upstreaming the changes is not an option, they should be applied as a separate patch and not hidden in a 5000 line patch without any justification or documentation (but Jason is already working on that)