On Thu, 23 Jan 2025 at 10:18, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > FWIW, benchmarking the CRC library functions is easy now; just enable > CONFIG_CRC_KUNIT_TEST=y and CONFIG_CRC_BENCHMARK=y. > > But, it's just a traditional benchmark that calls the functions in a loop, and > doesn't account for dcache thrashing. Yeah. I suspect the x86 vector version in particular is just not even worth it. If you have the crc instruction, the basic arch-optimized case is presumably already pretty good (and *that* code is tiny). Honestly, I took a quick look at the "by-4" and "by-8" cases, and considering that you still have to do per-byte lookups of the words _anyway_, I would expect that the regular by-1 is presumably not that much worse. IOW, maybe we could try to just do the simple by-1 for the generic case, and cut the x86 version down to the simple "use crc32b" case. And see if anybody even notices... Linus