On Wed, Aug 09, 2017 at 04:55:43PM +0200, René Scharfe wrote: > > I also wondered if using memcmp() could be a hint to the compiler to use > > an intrinsic or some other trick, especially because the "len" here is a > > constant. But in a toy function compiled with "gcc -S", it looks like we > > do keep the call to memcmp (so the speedup really is glibc, and not some > > compiler magic). > > GCC 7 inlines memcmp() if we only need a binary result: > > https://godbolt.org/g/iZ11Ne Cute. It turns it into a series of 8-byte xors. The original open-coded loop doesn't end up nearly as optimized with gcc-7. I suspect many calls in practice are of the binary-result type. So some of the speedup I saw may have been from compiler improvements and not libc improvements. Still, I think the general argument is the same, replacing "if your libc memcmp is fast" with "if your libc/compiler makes memcmp fast". -Peff