On Fri, 4 Oct 2019 at 15:36, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > On Wed, Oct 02, 2019 at 04:16:55PM +0200, Ard Biesheuvel wrote: > > Wire the existing x86 SIMD ChaCha code into the new ChaCha library > > interface. > > > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> > > --- > > arch/x86/crypto/chacha_glue.c | 36 ++++++++++++++++++++ > > crypto/Kconfig | 1 + > > include/crypto/chacha.h | 6 ++++ > > 3 files changed, 43 insertions(+) > > > > diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c > > index bc62daa8dafd..fd9ef42842cf 100644 > > --- a/arch/x86/crypto/chacha_glue.c > > +++ b/arch/x86/crypto/chacha_glue.c > > @@ -123,6 +123,42 @@ static void chacha_dosimd(u32 *state, u8 *dst, const u8 *src, > > } > > } > > > > +void hchacha_block(const u32 *state, u32 *stream, int nrounds) > > +{ > > + state = PTR_ALIGN(state, CHACHA_STATE_ALIGN); > > + > > + if (!crypto_simd_usable()) { > > + hchacha_block_generic(state, stream, nrounds); > > + } else { > > + kernel_fpu_begin(); > > + hchacha_block_ssse3(state, stream, nrounds); > > + kernel_fpu_end(); > > + } > > +} > > +EXPORT_SYMBOL(hchacha_block); > > Please correct me if I'm wrong: > > The approach here is slightly different from Zinc. In Zinc, I had one > entry point that conditionally called into the architecture-specific > implementation, and I did it inline using #includes so that in some > cases it could be optimized out. > > Here, you override the original symbol defined by the generic module > from the architecture-specific implementation, and in there you decide > which way to branch. > > Your approach has the advantage that you don't need to #include a .c > file like I did, an ugly yet very effective approach. > > But it has two disadvantages: > > 1. For architecture-specific code that _always_ runs, such as the > MIPS32r2 implementation of chacha, the compiler no longer has an > opportunity to remove the generic code entirely from the binary, > which under Zinc resulted in a smaller module. > It does. If you don't call hchacha_block_generic() in your code, the library that exposes it never gets loaded in the first place. Note that in this particular case, hchacha_block_generic() is exposed by code that is always builtin so it doesn't matter. > 2. The inliner can't make optimizations for that call. > > Disadvantage (2) might not make much of a difference. Disadvantage (1) > seems like a bigger deal. However, perhaps the linker is smart and can > remove the code and symbol? Or if not, is there a way to make the linker > smart? Or would all this require crazy LTO which isn't going to happen > any time soon?