Re: [PATCH] Use a real non-cryptographic hash algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 15, 2018 at 09:27:42AM -0500, Devin Hussey wrote:
> >From 502cbd61e386eb014035df66b032e90a9de926b7 Mon Sep 17 00:00:00 2001
> From: Devin Hussey <husseydevin@xxxxxxxxx>
> Date: Thu, 15 Nov 2018 09:12:24 -0500
> Subject: [PATCH] Use a real non-cryptographic hash algorithm

> dash previously used a modified version of the LoseLose algorithm.

> LoseLose is a TERRIBLE algorithm.

> It has terrible distribution, leaving many hash entries unused.

> However, more importantly, it is incredibly easy to make a
> collision; even something as simple as rearranging the letters.

> For example. these all produce the hash 1652:

> Hello
> Holle
> Hlleo
> Hoell
> Haxed\n
> HASH\tBAD
> Hater

> Collsions in hash tables are very bad because it requires
> looking through a linked list, and the benefits of O(1) time
> in a hash table are completely lost.

> The FNV-1a algorithm has comparable performance and code size
> while having a much better collision rate.

You do have a dependency chain with a multiplication per character, so
it is less suitable for long strings. There are implementation
techniques to avoid the dependency chain, but I'm not sure it's worth
the trouble.

> While there are some other faster and more resistant hashes out there,
> namely xxHash, Murmur2, and CityHash, the benefits are minimal on
> short strings and they expect a length argument, unlike FNV-1a which
> simply uses null-termination, and they are not in the public domain
> like FNV-1a.

> [snip]
> +        hashval = (hashval ^ (unsigned char)*p++) * FNV_PRIME;
> [snip]
> +        hashval = (hashval ^ (unsigned char)*p++) * FNV_PRIME;
> [snip]
> +        hashval = (hashval ^ (unsigned char) *p++) + FNV_PRIME;

I suppose this latter one should be * instead of + as well?

-- 
Jilles Tjoelker



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux