Re: [PATCH 0/1] Rosebush, a new hash table

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Sun, 25 Feb 2024 16:48:25 -0500

On Sun, Feb 25, 2024 at 02:47:45PM +0000, David Laight wrote:
> From: Kent Overstreet
> > Sent: 25 February 2024 03:19
> ..
> > when I implemented cuckoo (which is more obviously sensitive to a weak
> > hash function), I had to go with siphash, even jhash wasn't giving me
> > great reslts. and looking at the code it's not hard to see why, it's all
> > adds, and the rotates are byte aligned... you want mixed adds and xors
> > and the rotates to be more prime-ish.
> > 
> > right idea, just old...
> > 
> > what would be ideal is something more like siphash, but with fewer
> > rounds, so same number of instructions as jhash. xxhash might fit the
> > bill, I haven't looked at the code yet...
> 
> There is likely to be a point where scanning a list of values
> for the right hash value is faster than executing a hash function
> that is good enough to separate them to separate buckets.

Executing a bunch of alu ops that parallelize nicely is _fast_
(it's all xors/adds/rotates, chacha20 is a good example of how
the modern stuff works).

Also, for 2-way cuckoo there's an xor trick so you don't need to compute
a second hash.

But the key thing about cuckoo is that the loads on lookup _aren't
dependent_ - they can run in parallel. Every cache miss that goes all
the way to DRAM is stupidly expensive, remember - hundreds of
instructions, had you been able to keep the pipeline fed.

> You don't want to scan a linked list because they have horrid
> cache footprints.
> The locking is equally horrid - especially for remove.
> Arrays of pointers ar ethe way forward :-)

Well, maybe; I'm waiting to see fill factor numbers and benchmarks. Fill
factor was my concern when I was playing around with the concept; I used
it in a place where the hash table didn't need to be that big, and the
point was to avoid having to separately allocate the entries (and it
avoids the hassle of tombstone entries with linear/quadratic probing).

The fact that it requires a second dependent load, because buckets are
separately allocated, also looks like a big negative to me. I still
think a good lockless cuckoo hashing implementation ought to beat it.