Hi David, On Wed, Jan 19, 2022 at 3:41 PM David Laight <David.Laight@xxxxxxxxxx> wrote: > > From: Ard Biesheuvel > > Sent: 19 January 2022 12:19 > ... > > - (*compress)(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE); > > + if (IS_ENABLED(CONFIG_CRYPTO_ARCH_HAVE_LIB_BLAKE2S)) > > + (*compress)(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE); > > + else > > + blake2s_compress_generic(state, in, nblocks - 1, > > + BLAKE2S_BLOCK_SIZE); > > Isn't that a candidate for a 'static call' ? > > And, maybe all these inlined functions should be real functions? > No point having all the bloat on every call site. > Much better to call a real function and used the cached instructions. Not a good candidate for static call, as this doesn't actually need to change at runtime ever. It's using a function pointer here out of laziness to keep the same body of the function, like a compile-time template. You can sort of squint and imagine the C++. Unfortunately, CFI felt differently and still treats it as an indirect call. https://lore.kernel.org/linux-crypto/20220119135450.564115-1-Jason@xxxxxxxxx/ fixes it up to use a boolean instead, which will certainly be inlined away. So that's definitely an improvement on what's there now. For 5.18, I think it's probable that all of this stuff goes away anyway, and we don't need the templated helpers at all. So perhaps my patch will serve as an okay stop gap. Alternatively, maybe the clang people will say, "oh no, our bug" and then fix it in their neighborhood. According to https://github.com/ClangBuiltLinux/linux/issues/1567 it looks like that could be the case. > There are clearly optimisations for the top/bottom of the loop. > But they can be done to the generic C version. Optimizing the generic C version would be quite nice, as it'd help all platforms. Jason