Re: I'm a total push-over..

"Marko Kreen" <markokr@xxxxxxxxx> · Wed, 23 Jan 2008 16:01:18 +0200

On 1/23/08, Andreas Ericsson <ae@xxxxxx> wrote:
> Dmitry Potapov wrote:
> > On Wed, Jan 23, 2008 at 09:32:54AM +0100, Andreas Ericsson wrote:
> >> The FNV hash would be better (pasted below), but I doubt
> >> anyone will ever care, and there will be larger differences
> >> between architectures with this one than the lt_git hash (well,
> >> a function's gotta have a name).
> >
> > Actually, Bob Jenkins' lookup3 hash is twice faster in my tests
> > than FNV, and also it is much less likely to have any collision.
> >
>
> >From http://burtleburtle.net/bob/hash/doobs.html
> ---
> FNV Hash
>
> I need to fill this in. Search the web for FNV hash. It's faster than my hash on Intel (because Intel has fast multiplication), but slower on most other platforms. Preliminary tests suggested it has decent distributions.

I suspect that this paragraph was about comparison with lookup2
(not lookup3) because lookup3 beat easily all the "simple" hashes
in my testing.  Only competitor was Hsieh one which was like 50:50
faster or slower depending on alignment / compiler / cpu.

> ---
>
> My tests ran on Intel. I also noticed I had a few hashes commented out when
> doing the test, one of them being Paul Hsie's. For some reason, Jenkin's and
> Hsie's didn't perform well for me last time I used the comparison thing (I
> did a more thorough job back then, with tests running for several minutes
> per hash and table-size, so I commented out the poor candidates).
>
> I still believe that for this very simple case, the lookup3.c case is not
> very practical, as the code is that much more complicated, which was my
> main point with posting the comparison. Iow, not "switch to this hash,
> because it's better", but rather "the hash is not as bad as you think and
> will probably work well for all practical purposes".

If you don't mind few percent speed penalty compared to Jenkings
own optimized version, you can use my simplified version:

  http://repo.or.cz/w/pgbouncer.git?a=blob;f=src/hash.c;h=5c9a73639ad098c296c0be562c34573189f3e083;hb=HEAD

It works always with "native" endianess, unlike Jenkins fixed-endian
hashlittle() / hashbig().  It may or may not matter if you plan
to write values on disk.

Speed-wise it may be 10-30% slower worst case (in my case sparc-classic
with unaligned data), but on x86, lucky gcc version and maybe
also memcpy() hack seen in system.h, it tends to be ~10% faster,
especially as it does always 4byte read in main loop.

-- 
marko
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html