Re: [PATCH] Replace hashing algorithm with murmurhash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 18, 2012 at 11:34:16AM -0700, Eric Blake wrote:
> On 01/18/2012 09:23 AM, Daniel P. Berrange wrote:
> > @@ -1636,9 +1637,10 @@ cleanup:
> >  }
> >  
> >  
> > -static unsigned long virCgroupPidCode(const void *name)
> > +static int32_t virCgroupPidCode(const void *name, int32_t seed)
> >  {
> > -    return (unsigned long)name;
> > +    unsigned long pid = (unsigned long)name;
> > +    return virHashCodeGen(&pid, sizeof(pid), seed);
> 
> I'm not sure if it matters, but you could shorten these two lines to
> just one:
> 
> return virHashCodeGen(&name, sizeof(unsigned long), seed);

I actually preferred the slightly more verbose style, to make
it clearer that we're actually passing in the PID as an int,
cast to a pointer, instead of an int pointer.

> > @@ -101,11 +95,10 @@ static void virHashStrFree(void *name)
> >      VIR_FREE(name);
> >  }
> >  
> > -
> >  static unsigned long
> >  virHashComputeKey(virHashTablePtr table, const void *name)
> >  {
> > -    unsigned long value = table->keyCode(name);
> > +    uint32_t value = table->keyCode(name, table->seed);
> >      return (value % table->size);
> 
> If, on a 64-bit machine, table->size ever grows larger than uint32_t,
> then we've lost some hashing abilities compared to what the old unsigned
> long hash gave us.  Conversely, if table-size ever grows that large,
> we're burning an awful lot of memory in the first place, and a hash
> collision at that point is the least of our worries :)  I think this is
> okay, although you may want to consider changing virHashComputeKey to
> return uint32_t.

I changed the signature here to actually be  'size_t', likewise
for the table->size variable itself.


> > +++ b/src/util/hash.h
> > @@ -55,12 +55,15 @@ typedef int (*virHashSearcher) (const void *payload, const void *name,
> >  /**
> >   * virHashKeyCode:
> >   * @name: the hash key
> > + * @seed: random seed
> >   *
> > - * Compute the hash code corresponding to the key @name
> > + * Compute the hash code corresponding to the key @name, using
> > + * @seed to perturb the hashing algorithm
> >   *
> >   * Returns the hash code
> >   */
> > -typedef unsigned long (*virHashKeyCode)(const void *name);
> > +typedef int32_t (*virHashKeyCode)(const void *name,
> > +                                  int32_t seed);
> 
> Umm, you used uint32_t in virHashComputeKey, but int32_t here.  While I
> don't _think_ signed values will bite us due to sign-extension in 64-bit
> operations introducing a bias, I'd rather make this signature:
> 
> typedef uint32_t (*virHashKeyCode)(const void *name, uint32_t seed);
> 
> and adjust the callers to comply with unsigned everywhere.

Yeah, that was a mistake - it should have been uint32.

> That is, I propose that we ditch virRandom, and instead implement a new
> function:
> 
> /* Return an evenly distributed random number between [0,2^nbits), where
> nbits must be in the range (0,64]. */
> uint64_t virRandomBits(int nbits)
> {
>     int bits_per_iter = count_one_bits(RAND_MAX);
>     uint64_t ret = 0;
>     int32_t bits;
>     /* This algorithm requires that RAND_MAX == 2^n-1 for some n;
>        gnulib's random_r meets this property. */
>     verify(((RAND_MAX + 1U) & RAND_MAX) == 0);
> 
>     virMutexLock(&randomLock);
> 
>     while (nbits > bits_per_iter) {
>         random_r(&randomData, &bits);
>         ret = (ret << bits_per_iter) | (bits & RAND_MAX);
>         nbits -= bits_per_iter;
>     }
> 
>     random_r(&randomData, &bits);
>     ret = (ret << nbits) | (bits & ((1 << nbits) - 1));
> 
>     virMutexUnlock(&randomLock);
>     return ret;
> }
> 
> where existing calls to virRandom(256) will now be virRandomBits(8),
> virRandom(1024*1024*1024) will now be virRandomBits(30), and your code
> would use uint32_t seed = virRandom(32).

Yep, I've done that.

> > +#include "virhashcode.h"
> > +
> > +static uint32_t rotl(uint32_t x, int8_t r)
> > +{
> > +    return (x << r) | (x >> (32 - r));
> > +}
> 
> Gnulib provides the bitrotate module, which provides "bitrotate.h" and
> an implementation of this function that is guaranteed to be as efficient
> as possible when compiled by gcc (possibly by using compiler-builtins to
> access raw assembly on architectures that have builtin rotate
> instructions).  I'd suggest using that rather than rolling your own.

Good point.

> 
> > +
> > +/* slower than original but is endian neutral and handles platforms that
> > + * do only aligned reads */
> > +__attribute__((always_inline))
> > +static uint32_t getblock(const uint8_t *p, int i)
> > +{
> > +    uint32_t r;
> > +    size_t size = sizeof(uint32_t);
> > +
> > +    memcpy(&r, &p[i * size], size);
> > +
> > +    return le32toh(r);
> 
> <endian.h>, and thus le32toh(), is not yet standardized (although POSIX
> will be adding it in the future), nor is it currently provided by
> gnulib.  We'd have to get that fixed first.

The le32toh call was only here because the code I copied wanted to be
endian neutral. I don't think libvirt really cares if its hash codes
are endian neutral, so I trivially just removed the le32toh call and
avoid the problem of endian.h


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list


[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]