On Sun, Feb 25, 2024 at 02:47:45PM +0000, David Laight wrote: > From: Kent Overstreet > > Sent: 25 February 2024 03:19 > .. > > when I implemented cuckoo (which is more obviously sensitive to a weak > > hash function), I had to go with siphash, even jhash wasn't giving me > > great reslts. and looking at the code it's not hard to see why, it's all > > adds, and the rotates are byte aligned... you want mixed adds and xors > > and the rotates to be more prime-ish. > > > > right idea, just old... > > > > what would be ideal is something more like siphash, but with fewer > > rounds, so same number of instructions as jhash. xxhash might fit the > > bill, I haven't looked at the code yet... > > There is likely to be a point where scanning a list of values > for the right hash value is faster than executing a hash function > that is good enough to separate them to separate buckets. Executing a bunch of alu ops that parallelize nicely is _fast_ (it's all xors/adds/rotates, chacha20 is a good example of how the modern stuff works). Also, for 2-way cuckoo there's an xor trick so you don't need to compute a second hash. But the key thing about cuckoo is that the loads on lookup _aren't dependent_ - they can run in parallel. Every cache miss that goes all the way to DRAM is stupidly expensive, remember - hundreds of instructions, had you been able to keep the pipeline fed. > You don't want to scan a linked list because they have horrid > cache footprints. > The locking is equally horrid - especially for remove. > Arrays of pointers ar ethe way forward :-) Well, maybe; I'm waiting to see fill factor numbers and benchmarks. Fill factor was my concern when I was playing around with the concept; I used it in a place where the hash table didn't need to be that big, and the point was to avoid having to separately allocate the entries (and it avoids the hassle of tombstone entries with linear/quadratic probing). The fact that it requires a second dependent load, because buckets are separately allocated, also looks like a big negative to me. I still think a good lockless cuckoo hashing implementation ought to beat it.