On Thu, 17 Oct 2024 at 11:41, Ard Biesheuvel <ardb+git@xxxxxxxxxx> wrote: > > From: Ard Biesheuvel <ardb@xxxxxxxxxx> > > The CRC-32 code is library code, and is not part of the crypto > subsystem. This means that callers may not generally be aware of the > kind of implementation that backs it, and so we've refrained from using > FP/SIMD code in the past, as it disables preemption, and this may incur > scheduling latencies that the caller did not anticipate. > > This was solved a while ago, and on arm64, kernel mode FP/SIMD no longer > disables preemption. > > This means we can happily use PMULL instructions in the CRC-32 library > code, which permits an optimization to be implemented that results in a > speedup of 2 - 2.8x for inputs >1k in size (on Apple M2) > > Patch #1 implements some prepwork to handle the scalar CRC-32 > alternatives patching in C code. > > Changes since v2: > - drop alternatives.h #include (#1) > - drop unneeded branch (#2) > - fix comment max -> min (#2) > - add Eric's Rb > > Changes since v1: > - rename crc32-pmull.S to crc32-4way.S and avoid pmull in the function > names to avoid confusion about the nature of the implementation; > - polish the asm a bit, and add some comments > - don't return via the scalar code if len dropped to 0 after calling the > 4-way code. > > Cc: Eric Biggers <ebiggers@xxxxxxxxxx> > Cc: Kees Cook <kees@xxxxxxxxxx> > > Ard Biesheuvel (2): > arm64/lib: Handle CRC-32 alternative in C code > arm64/crc32: Implement 4-way interleave using PMULL > I'll need to respin this - the crc32_be code doesn't actually work correctly.