Hey Andy, Thanks for weighing in. > inlining. I'd be surprised for chacha20. If you really want inlining > to dictate the overall design, I think you need some real numbers for > why it's necessary. There also needs to be a clear story for how > exactly making everything inline plays with the actual decision of > which implementation to use. Take a look at my description for the MIPS case: when on MIPS, the arch code is *always* used since it's just straight up scalar assembly. In this case, the chacha20_arch function *never* returns false [1], which means it's always included [2], so the generic implementation gets optimized out, saving disk and memory, which I assume MIPS people care about. [1] https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/tree/lib/zinc/chacha20/chacha20-mips-glue.c?h=jd/wireguard#n13 [2] https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/tree/lib/zinc/chacha20/chacha20.c?h=jd/wireguard#n118 I'm fine with considering this a form of "premature optimization", though, and ditching the motivation there. On Thu, Sep 26, 2019 at 11:37 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > My suggestion from way back, which is at > least a good deal of the way toward being doable, is to do static > calls. This means that the common code will call out to the arch code > via a regular CALL instruction and will *not* inline the arch code. > This means that the arch code could live in its own module, it can be > selected at boot time, etc. Alright, let's do static calls, then, to deal with the case of going from the entry point implementation in lib/zinc (or lib/crypto, if you want, Ard) to the arch-specific implementation in arch/${ARCH}/crypto. And then within each arch, we can keep it simple, since everything is already in the same directory. Sound good? Jason