On Jan 23, 2008, at 6:39 AM, Andreas Ericsson wrote:
Marko Kreen wrote:
On 1/23/08, Andreas Ericsson <ae@xxxxxx> wrote:
Dmitry Potapov wrote:
On Wed, Jan 23, 2008 at 09:32:54AM +0100, Andreas Ericsson wrote:
The FNV hash would be better (pasted below), but I doubt
anyone will ever care, and there will be larger differences
between architectures with this one than the lt_git hash (well,
a function's gotta have a name).
Actually, Bob Jenkins' lookup3 hash is twice faster in my tests
than FNV, and also it is much less likely to have any collision.
>From http://burtleburtle.net/bob/hash/doobs.html
---
FNV Hash
I need to fill this in. Search the web for FNV hash. It's faster
than my hash on Intel (because Intel has fast multiplication),
but slower on most other platforms. Preliminary tests suggested
it has decent distributions.
I suspect that this paragraph was about comparison with lookup2
It might be. It's from the link Dmitry posted in his reply to my
original
message. (something/something/doobs.html).
(not lookup3) because lookup3 beat easily all the "simple" hashes
By how much? FNV beat Linus' hash by 0.01 microseconds / insertion,
and 0.1 microsecons / lookup. We're talking about a case here where
there will never be more lookups than insertions (unless I'm much
mistaken).
If you don't mind few percent speed penalty compared to Jenkings
own optimized version, you can use my simplified version:
http://repo.or.cz/w/pgbouncer.git?a=blob;f=src/
hash.c;h=5c9a73639ad098c296c0be562c34573189f3e083;hb=HEAD
I don't, but I don't care that deeply either. On the one hand,
it would be nifty to have an excellent hash-function in git.
On the other hand, it would look stupid with something that's
quite clearly over-kill.
It works always with "native" endianess, unlike Jenkins fixed-endian
hashlittle() / hashbig(). It may or may not matter if you plan
to write values on disk.
Speed-wise it may be 10-30% slower worst case (in my case sparc-
classic
with unaligned data), but on x86, lucky gcc version and maybe
also memcpy() hack seen in system.h, it tends to be ~10% faster,
especially as it does always 4byte read in main loop.
It would have to be a significant improvement in wall-clock time
on a test-case of hashing 30k strings to warrant going from 6 to 80
lines of code, imo. I still believe the original dumb hash Linus
wrote is "good enough".
On a side-note, it was very interesting reading, and I shall have
to add jenkins3_mkreen() to my test-suite (although the "keep
copyright note" license thing bugs me a bit).
Would you, for completeness' sake, please add Tcl and STL hashes to
your test suite? The numbers are quite interesting. Is your test
suite available somewhere, so we can test with our own data and
hardware as well. Both Tcl hash and STL (from SGI probably HP days,
still the current default with g++) string hashes are extremely
simple (excluding the loop constructs):
Tcl: h += (h<<3) + c; // essentially *9+c (but work better on non-
late-intels)
STL: h = h * 5 + c; // worse than above for most of my data
Thanks,
__Luke
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html