From: Ard Biesheuvel <ardb@xxxxxxxxxx> The CRC-32 code is library code, and is not part of the crypto subsystem. This means that callers may not generally be aware of the kind of implementation that backs it, and so we've refrained from using FP/SIMD code in the past, as it disables preemption, and this may incur scheduling latencies that the caller did not anticipate. This was solved a while ago, and on arm64, kernel mode FP/SIMD no longer disables preemption. This means we can happily use PMULL instructions in the CRC-32 library code, which permits an optimization to be implemented that results in a speedup of 2 - 2.8x for inputs >1k in size (on Apple M2) Patch #1 implements some prepwork to handle the scalar CRC-32 alternatives patching in C code. Changes since v2: - drop alternatives.h #include (#1) - drop unneeded branch (#2) - fix comment max -> min (#2) - add Eric's Rb Changes since v1: - rename crc32-pmull.S to crc32-4way.S and avoid pmull in the function names to avoid confusion about the nature of the implementation; - polish the asm a bit, and add some comments - don't return via the scalar code if len dropped to 0 after calling the 4-way code. Cc: Eric Biggers <ebiggers@xxxxxxxxxx> Cc: Kees Cook <kees@xxxxxxxxxx> Ard Biesheuvel (2): arm64/lib: Handle CRC-32 alternative in C code arm64/crc32: Implement 4-way interleave using PMULL arch/arm64/lib/Makefile | 2 +- arch/arm64/lib/crc32-4way.S | 242 ++++++++++++++++++++ arch/arm64/lib/crc32-glue.c | 82 +++++++ arch/arm64/lib/crc32.S | 22 +- 4 files changed, 331 insertions(+), 17 deletions(-) create mode 100644 arch/arm64/lib/crc32-4way.S create mode 100644 arch/arm64/lib/crc32-glue.c -- 2.47.0.rc1.288.g06298d1525-goog