On Wed, Oct 02, 2019 at 04:16:55PM +0200, Ard Biesheuvel wrote: > Wire the existing x86 SIMD ChaCha code into the new ChaCha library > interface. > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> > --- > arch/x86/crypto/chacha_glue.c | 36 ++++++++++++++++++++ > crypto/Kconfig | 1 + > include/crypto/chacha.h | 6 ++++ > 3 files changed, 43 insertions(+) > > diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c > index bc62daa8dafd..fd9ef42842cf 100644 > --- a/arch/x86/crypto/chacha_glue.c > +++ b/arch/x86/crypto/chacha_glue.c > @@ -123,6 +123,42 @@ static void chacha_dosimd(u32 *state, u8 *dst, const u8 *src, > } > } > > +void hchacha_block(const u32 *state, u32 *stream, int nrounds) > +{ > + state = PTR_ALIGN(state, CHACHA_STATE_ALIGN); > + > + if (!crypto_simd_usable()) { > + hchacha_block_generic(state, stream, nrounds); > + } else { > + kernel_fpu_begin(); > + hchacha_block_ssse3(state, stream, nrounds); > + kernel_fpu_end(); > + } > +} > +EXPORT_SYMBOL(hchacha_block); Please correct me if I'm wrong: The approach here is slightly different from Zinc. In Zinc, I had one entry point that conditionally called into the architecture-specific implementation, and I did it inline using #includes so that in some cases it could be optimized out. Here, you override the original symbol defined by the generic module from the architecture-specific implementation, and in there you decide which way to branch. Your approach has the advantage that you don't need to #include a .c file like I did, an ugly yet very effective approach. But it has two disadvantages: 1. For architecture-specific code that _always_ runs, such as the MIPS32r2 implementation of chacha, the compiler no longer has an opportunity to remove the generic code entirely from the binary, which under Zinc resulted in a smaller module. 2. The inliner can't make optimizations for that call. Disadvantage (2) might not make much of a difference. Disadvantage (1) seems like a bigger deal. However, perhaps the linker is smart and can remove the code and symbol? Or if not, is there a way to make the linker smart? Or would all this require crazy LTO which isn't going to happen any time soon?