Now that aegis128 has been announced as one of the winners of the CAESAR competition, it's time to provide some better support for it on arm64 (and 32-bit ARM *) This time, instead of cloning the generic driver twice and rewriting half of it in arm64 and ARM assembly, add hooks for an accelerated SIMD path to the generic driver, and populate it with a C version using NEON intrinsics that can be built for both ARM and arm64. This results in a speedup of ~11x, resulting in a performance of 2.2 cycles per byte on Cortex-A53. Patches #1 .. #3 are some fixes/improvements for the generic code. Patch #4 adds the plumbing for using a SIMD accelerated implementation. Patch #5 adds the ARM and arm64 code, and patch #6 adds a speed test. Note that aegis128l and aegis256 were not selected, and nor where any of the morus contestants, and so we should probably consider dropping those drivers again. * 32-bit ARM today rarely provides the special AES instruction that the implementation in this series relies on, but this may change in the future, and the NEON intrinsics code can be compiled for both ISAs. Cc: Eric Biggers <ebiggers@xxxxxxxxxx> Cc: Ondrej Mosnacek <omosnace@xxxxxxxxxx> Cc: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Cc: Steve Capper <steve.capper@xxxxxxx> Ard Biesheuvel (6): crypto: aegis128 - use unaliged helper in unaligned decrypt path crypto: aegis - drop empty TFM init/exit routines crypto: aegis - avoid prerotated AES tables crypto: aegis128 - add support for SIMD acceleration crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics crypto: tcrypt - add a speed test for AEGIS128 crypto/Kconfig | 5 + crypto/Makefile | 12 ++ crypto/aegis.h | 28 ++-- crypto/aegis128-neon-inner.c | 142 ++++++++++++++++++++ crypto/aegis128-neon.c | 43 ++++++ crypto/aegis128.c | 55 +++++--- crypto/aegis128l.c | 11 -- crypto/aegis256.c | 11 -- crypto/tcrypt.c | 7 + 9 files changed, 261 insertions(+), 53 deletions(-) create mode 100644 crypto/aegis128-neon-inner.c create mode 100644 crypto/aegis128-neon.c -- 2.20.1