Re: [PATCH 3/3] tests: deterministichash: Make hash tables arch-independent

"Daniel P. Berrange" <berrange@xxxxxxxxxx> · Tue, 15 Aug 2017 14:17:57 +0100

On Tue, Aug 15, 2017 at 03:03:07PM +0200, Michal Privoznik wrote:
> On 08/15/2017 02:01 PM, Peter Krempa wrote:
> > On Mon, Aug 14, 2017 at 20:46:10 -0400, John Ferlan wrote:
> >> On 08/03/2017 04:50 AM, Peter Krempa wrote:
> > 
> > [ trimmed off-topic part ]
> > 
> >> NB: I didn't dig into the qemumonitorjsontest algorithm, but...
> >>
> >> While going through the common object changes, I ended up looking at and
> >> thinking about the hash table algorithms, initially it was just the
> >> "grow" algorithm as I think it is a bit "aggressive" (perhaps wasteful)
> >> since as soon as 1 bucket chain exceeds 8 elements, the table is resized
> >> by 8 unless the new size is 8 * 2048 - at which time no matter how full
> >> a bucket is we don't resize the table
> > 
> > Resizing the table is very expensive, so it makes sense to scale it
> > aggresively. At some point though it would take too much memory.
> > 
> > If some place in the code expects massive hash table, it can set the
> > hash table to > 2048 manually.
> > 
> >> The next thing I considered was using a prime number as the table bucket
> >> size and it seems by just doing that will cause qemumonitorjsontest to
> > 
> > What would be the benefit of such change? I don't really see how prime
> > numbers would improve anything performance-wise.
> 
> Because if your hash function is stupid it can cluster all the keys
> together. Primes, however, have way fewer divisors, therefore clustering
> is harder to achieve. Of course, you can construct a hash function so
> that it's utterly stupid (e.g. {return 4;}) and then all of our
> optimizations are thrown out of the window. But I believe that using
> prime numbers for table size is advised in the literature too.
> Apparently, wiki mentions this fact too:
> 
> https://en.wikipedia.org/wiki/Hash_table#Choosing_a_hash_function

If your hash function has a problem with clustering, then you really
should replace the hash function with a better one - which we already
did a few years back now :-)

Incidentally, we should replace our murmurhash funtion with siphash
at some point, because people have found attacks against murmurhash,
and so most devs have migrated over to siphash.

Regards,
Daniel

[1] https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list