Re: [PATCH] util: hash: Append to hash buckets when adding new entries

Daniel P. Berrangé <berrange@xxxxxxxxxx> · Tue, 16 Apr 2019 15:32:31 +0100

On Tue, Apr 16, 2019 at 04:26:49PM +0200, Peter Krempa wrote:
> On Tue, Apr 16, 2019 at 15:17:31 +0100, Daniel Berrange wrote:
> > On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
> > > In cases when the hash function for a name collides with other entry
> > > already in the hash we prepend to the bucket. This creates a 'stack
> > > effect' on the buckets if we then iterate through the hash. Normally
> > > this is not a problem, but in tests we want deterministic results.
> > > 
> > > Since it does not matter where we add the entry and it's usually more
> > > probable that a different entry will be accessed next change it to
> > > append to the end of the bucket. Luckily we already iterate throught the
> > > bucket once thus we can easily find the last entry and just connect the
> > > new entry after it.
> > 
> > I'm not understanding how this change makes the tests any more
> > deterministic than before.
> > 
> > The determinism for hash bucket placement is already ensured by
> > having a non-random impl forvirHashCodeGen in
> > tests/virdeterministichashmock.c
> > 
> > This change simply reverses the order in which we iterate over values
> > when many keys hash to the same bucket.  If this mattered for tests
> > then surely, this change would have to update the test suite too to
> > take account of the new reversed ordering ?
> 
> If you have a hash collision when adding into the hash and then use
> virHashForeach the items are formatted in opposite order.
> 
> Thus if you use this to e.g. store things parsed from XML and then
> format it back, every single iteration reverses the order. This is not a
> problem per-se since we can use a different output file, but it's
> inconvenient since we can't use the same output file.
> 
> Currently there aren't any collisions which would cause problems, but I
> have one on my private branch.
> 
> Given that this change is rather simple I figured we could just stop
> inverting them.

Ok, that does make a little more sense, though generally my view would
be that if we're doing something that is order sensitive like XML
parsing & formatting, then we shouldn't be using a hash table to
represent the data - at least not as the primary record. 

Which bit of code is using the hash in this case ? Is it something
purely inside the test suite, or in the real XML handling code ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list