Re: [PATCH 3/3] tests: deterministichash: Make hash tables arch-independent

Peter Krempa <pkrempa@xxxxxxxxxx> · Tue, 15 Aug 2017 14:01:25 +0200

On Mon, Aug 14, 2017 at 20:46:10 -0400, John Ferlan wrote:
> On 08/03/2017 04:50 AM, Peter Krempa wrote:

[ trimmed off-topic part ]

> NB: I didn't dig into the qemumonitorjsontest algorithm, but...
> 
> While going through the common object changes, I ended up looking at and
> thinking about the hash table algorithms, initially it was just the
> "grow" algorithm as I think it is a bit "aggressive" (perhaps wasteful)
> since as soon as 1 bucket chain exceeds 8 elements, the table is resized
> by 8 unless the new size is 8 * 2048 - at which time no matter how full
> a bucket is we don't resize the table

Resizing the table is very expensive, so it makes sense to scale it
aggresively. At some point though it would take too much memory.

If some place in the code expects massive hash table, it can set the
hash table to > 2048 manually.

> The next thing I considered was using a prime number as the table bucket
> size and it seems by just doing that will cause qemumonitorjsontest to

What would be the benefit of such change? I don't really see how prime
numbers would improve anything performance-wise.

> fail. Easy enough to reproduce by changing the virHashCreate call in
> qemuBlockNodeNameGetBackingChain to use 53 instead of 50 (even 49 causes
> failure). It doesn't matter if the test source also adjusts the initial
> hash size.
> 
> Of course this makes sense given that virHashCodeGen is the backend of
> the virHashComputeKey algorithm that uses the return value %
> table->size. So if "size" changes, then obviously order changes.

As you've said that is expected. I don't quite follow what is your
point here.

> While I get the code bloat conundrum, but if the order doesn't matter
> then I wonder what other variables could change that would affect this
> test and whether we really want to be constantly altering the mocks
> "just" to get the right answer for this test.

The order depends on the result of the hasing function, the number of
buckets and the order you've added the entries. All of those is
expected when dealing with a hash table.

After this last patch, you are guaranteed to have the hash function
return deterministic results (even on big endian-boxes).

The hash size is not really altered very often since you can't really
see a measurable performance benefit in most cases. This is also
strenghtened by the fact, that the hash function is randomized by a
random number, so in real usage you won't be able to deterministically
cause hash collisions.

In the test code, the cases are added in a deterministic order. Since
the hashing function in the tests is deterministic as well, the only
thing that would influence the ordering is the bucket count. We don't
modify them too often.

I don't see what would make us alter mocks "just" to get the right
answer here. The reason for this patch was, that the hash function is
using bit operations and then treating the result as integers, which
obviously does not work deterministically between big and little endian
arches. Other than that, the test is now fully deterministic.

Peter
Attachment:
signature.asc

Description: PGP signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list