On Wed, 2022-03-30 at 15:45 -0700, Linus Torvalds wrote: > On Wed, Mar 30, 2022 at 3:22 PM Trond Myklebust > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > With 9175 ext4 offsets, I see 157 collisions (== hash buckets with > > > 1 > > entry). So hash_64() does perform less well when you're hashing a > > value > > that is already a hash. > > No collisions with xxhash? Because xxhash() reality seems to do > pretty > similar things in the end (multiply by a prime, shift bits down and > xor them). > > In fact, the main difference seems to be that xxhash() will do a > "rotl()" by 27 before doing the prime multiplication, and then it > will > finish the thing by a few more multiples mixed with shifting the high > bits down a few times. > > Our regular fast hash doesn't do the "shift bits down", because it > relies on only using the upper bits anyway (and it is pretty heavily > geared towards "fast and good enough"). > > But if the *source* of the hash has a lot of low bits clear, I can > imagine that the "rotl" that xxhash does improves on the bit > distribution of the multiplication (which will only move bits > upwards). > > And if it turns out our default hash has some bad cases, I'd prefer > to > fix _that_ regardless.. > Hmm... No there doesn't appear to be a huge difference between the two. With both test programs running on the same data set of ext4 getdents offsets, I see the following. With xxhash64 reduced to 18 bits, I see: read 57654 entries min = 0, max = 5, collisions = 5501, avg = 1 read 98978 entries min = 0, max = 6, collisions = 14730, avg = 1 ..and with hash_64() reduced to 18 bits: read 57654 entries min = 0, max = 4, collisions = 5538, avg = 1 read 98978 entries min = 0, max = 5, collisions = 14623, avg = 1 So they both appear to be seeing similar collision rates with the same data sets -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx