On Wed, Mar 30, 2022 at 3:22 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > With 9175 ext4 offsets, I see 157 collisions (== hash buckets with > 1 > entry). So hash_64() does perform less well when you're hashing a value > that is already a hash. No collisions with xxhash? Because xxhash() reality seems to do pretty similar things in the end (multiply by a prime, shift bits down and xor them). In fact, the main difference seems to be that xxhash() will do a "rotl()" by 27 before doing the prime multiplication, and then it will finish the thing by a few more multiples mixed with shifting the high bits down a few times. Our regular fast hash doesn't do the "shift bits down", because it relies on only using the upper bits anyway (and it is pretty heavily geared towards "fast and good enough"). But if the *source* of the hash has a lot of low bits clear, I can imagine that the "rotl" that xxhash does improves on the bit distribution of the multiplication (which will only move bits upwards). And if it turns out our default hash has some bad cases, I'd prefer to fix _that_ regardless.. Linus