On Thu, Sep 26, 2019 at 1:52 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > Hi Ard, > > > Our goals are that chacha20_arch() from each of these arch glues gets > included in the lib/crypto/chacha20.c compilation unit. The reason why > we want it in its own unit is so that the inliner can get rid of > unreached code and more tightly integrate the branches. For the MIPS > case, the advantage is clear. IMO this needs numbers. My suggestion from way back, which is at least a good deal of the way toward being doable, is to do static calls. This means that the common code will call out to the arch code via a regular CALL instruction and will *not* inline the arch code. This means that the arch code could live in its own module, it can be selected at boot time, etc. For x86, inlining seems a but nuts to avoid a whole mess of: if (use avx2) do_avx2_thing(); else if (use avx1) do_avx1_thing(); else etc; On x86, direct calls are pretty cheap. Certainly for operations like curve25519, I doubt you will ever see a real-world effect from inlining. I'd be surprised for chacha20. If you really want inlining to dictate the overall design, I think you need some real numbers for why it's necessary. There also needs to be a clear story for how exactly making everything inline plays with the actual decision of which implementation to use. I think it's also worth noting that LTO is coming. --Andy