On 21.07.21 г. 23:27, Linus Torvalds wrote: > On Wed, Jul 21, 2021 at 1:13 PM David Sterba <dsterba@xxxxxxx> wrote: >> >> adding a memcmp_large that compares by native words or u64 could be >> the best option. > > Yeah, we could just special-case that one place. This who thread started because I first implemented a special case just for dedupe and Dave Chinner suggested instead of playing whack-a-mole to get something decent for the generic memcmp so that we get an improvement across the whole of the kernel. > > But see the patches I sent out - I think we can get the best of both worlds. > > A small and simple memcmp() that is good enough and not the > _completely_ stupid thing we have now. > > The second patch I sent out even gets the mutually aligned case right. > > Of course, the glibc code also ended up unrolling things a bit, but > honestly, the way it did it was too disgusting for words. > > And if it really turns out that the unrolling makes a big difference - > although I doubt it's meaningful with any modern core - I can add a > couple of lines to that simple patch I sent out to do that too. > Without getting the monster that is that glibc code. > > Of course, my patch depends on the fact that "get_unaligned()" is > cheap on all CPU's that really matter, and that caches aren't > direct-mapped any more. The glibc code seems to be written for a world > where registers are cheap, unaligned accesses are prohibitively > expensive, and unrolling helps because L1 caches are direct-mapped and > you really want to do chunking to not get silly way conflicts. > > If old-style Sparc or MIPS was our primary target, that would be one > thing. But it really isn't. > > Linus >