On Thu, Feb 27, 2025 at 03:47:03PM -0800, Bill Wendling wrote: > For both gcc and clang, crc32 builtins generate better code than the > inline asm. GCC improves, removing unneeded "mov" instructions. Clang > does the same and unrolls the loops. GCC has no changes on i386, but > Clang's code generation is vastly improved, due to Clang's "rm" > constraint issue. > > The number of cycles improved by ~0.1% for GCC and ~1% for Clang, which > is expected because of the "rm" issue. However, Clang's performance is > better than GCC's by ~1.5%, most likely due to loop unrolling. Also note that the patch https://lore.kernel.org/r/20250210210741.471725-1-ebiggers@xxxxxxxxxx/ (which is already enqueued in the crc tree for 6.15) changes "rm" to "r" when the compiler is clang, to improve clang's code generation. The numbers you quote are against the original version, right? - Eric