On Thu, Dec 31, 2020 at 06:23:30PM +0100, Ard Biesheuvel wrote: > The x86 glue helper module has started to show its age: > - It relies heavily on function pointers to invoke asm helper functions that > operate on fixed input sizes that are relatively small. This means the > performance is severely impacted by retpolines. > - It goes to great lengths to amortize the cost of kernel_fpu_begin()/end() > over as much work as possible, which is no longer necessary now that FPU > save/restore is done lazily, and doing so may cause unbounded scheduling > blackouts due to the fact that enabling the FPU in kernel mode disables > preemption. > - The CBC mode decryption helper makes backward strides through the input, in > order to avoid a single block size memcpy() between chunks. Consuming the > input in this manner is highly likely to defeat any hardware prefetchers, > so it is better to go through the data linearly, and perform the extra > memcpy() where needed (which is turned into direct loads and stores by the > compiler anyway). Note that benchmarks won't show this effect, given that > the memory they use is always cache hot. > > GCC does not seem to be smart enough to elide the indirect calls when the > function pointers are passed as arguments to static inline helper routines > modeled after the existing ones. So instead, let's create some CPP macros > that encapsulate the core of the ECB and CBC processing, so we can wire > them up for existing users of the glue helper module, i.e., Camellia, > Serpent, Twofish and CAST6. > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > --- > arch/x86/crypto/ecb_cbc_helpers.h | 71 ++++++++++++++++++++ > 1 file changed, 71 insertions(+) Acked-by: Eric Biggers <ebiggers@xxxxxxxxxx>