On Wed, Jan 22, 2025 at 11:46:18PM -0800, Eric Biggers wrote: > > Actually, I'm tempted to just provide slice-by-1 (a.k.a. byte-by-byte) as the > only generic CRC32 implementation. The generic code has become increasingly > irrelevant due to the arch-optimized code existing. The arch-optimized code > tends to be 10 to 100 times faster on long messages. Yeah, that's my intuition as well; I would think the CPU's that don't have a CRC32 optimization instruction(s) would probably be the most sensitive to dcache thrashing. But given that Geert ran into this on m68k (I assume), maybe we could have him benchmark the various crc32 generic implementation to see if we is the best for him? That is, assuming that he cares (which he might not. :-). - Ted