On Wed, Jul 21, 2021 at 1:13 PM David Sterba <dsterba@xxxxxxx> wrote: > > adding a memcmp_large that compares by native words or u64 could be > the best option. Yeah, we could just special-case that one place. But see the patches I sent out - I think we can get the best of both worlds. A small and simple memcmp() that is good enough and not the _completely_ stupid thing we have now. The second patch I sent out even gets the mutually aligned case right. Of course, the glibc code also ended up unrolling things a bit, but honestly, the way it did it was too disgusting for words. And if it really turns out that the unrolling makes a big difference - although I doubt it's meaningful with any modern core - I can add a couple of lines to that simple patch I sent out to do that too. Without getting the monster that is that glibc code. Of course, my patch depends on the fact that "get_unaligned()" is cheap on all CPU's that really matter, and that caches aren't direct-mapped any more. The glibc code seems to be written for a world where registers are cheap, unaligned accesses are prohibitively expensive, and unrolling helps because L1 caches are direct-mapped and you really want to do chunking to not get silly way conflicts. If old-style Sparc or MIPS was our primary target, that would be one thing. But it really isn't. Linus