This patchset is also available in git via: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git crc-x86-v1 This patchset applies on top of my other recent CRC patchsets https://lore.kernel.org/r/20241103223154.136127-1-ebiggers@xxxxxxxxxx/ and https://lore.kernel.org/r/20241117002244.105200-1-ebiggers@xxxxxxxxxx/ . Consider it a preview for what may be coming next, as my priority is getting those two other patchsets merged first. This patchset adds a new assembly macro that expands into the body of a CRC function for x86 for the specified number of bits, bit order, vector length, and AVX level. There's also a new script that generates the constants needed by this function, given a CRC generator polynomial. This approach allows easily wiring up an x86-optimized implementation of any variant of CRC-8, CRC-16, CRC-32, or CRC-64, including full support for VPCLMULQDQ. On long messages the resulting functions are up to 4x faster than the existing PCLMULQDQ optimized functions when they exist, or up to 29x faster than the existing table-based functions. This patchset starts by wiring up the new macro for crc32_le, crc_t10dif, and crc32_be. Later I'd also like to wire up crc64_be and crc64_rocksoft, once the design of the library functions for those has been fixed to be like what I'm doing for crc32* and crc_t10dif. A similar approach of sharing code between CRC variants, and vector lengths when applicable, should work for other architectures. The CRC constant generation script should be mostly reusable. Eric Biggers (6): x86: move zmm exclusion list into CPU feature flag scripts/crc: add gen-crc-consts.py x86/crc: add "template" for [V]PCLMULQDQ based CRC functions x86/crc32: implement crc32_le using new template x86/crc-t10dif: implement crc_t10dif using new template x86/crc32: implement crc32_be using new template arch/x86/Kconfig | 2 +- arch/x86/crypto/aesni-intel_glue.c | 22 +- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/intel.c | 22 + arch/x86/lib/Makefile | 2 +- arch/x86/lib/crc-pclmul-consts.h | 148 ++++++ arch/x86/lib/crc-pclmul-template-glue.h | 84 ++++ arch/x86/lib/crc-pclmul-template.S | 588 ++++++++++++++++++++++++ arch/x86/lib/crc-t10dif-glue.c | 22 +- arch/x86/lib/crc16-msb-pclmul.S | 6 + arch/x86/lib/crc32-glue.c | 38 +- arch/x86/lib/crc32-pclmul.S | 220 +-------- arch/x86/lib/crct10dif-pcl-asm_64.S | 332 ------------- scripts/crc/gen-crc-consts.py | 207 +++++++++ 14 files changed, 1087 insertions(+), 607 deletions(-) create mode 100644 arch/x86/lib/crc-pclmul-consts.h create mode 100644 arch/x86/lib/crc-pclmul-template-glue.h create mode 100644 arch/x86/lib/crc-pclmul-template.S create mode 100644 arch/x86/lib/crc16-msb-pclmul.S delete mode 100644 arch/x86/lib/crct10dif-pcl-asm_64.S create mode 100755 scripts/crc/gen-crc-consts.py -- 2.47.0