Re: I'm a total push-over..

Andreas Ericsson <ae@xxxxxx> · Thu, 24 Jan 2008 11:39:16 +0100

Dmitry Potapov wrote:
On Wed, Jan 23, 2008 at 10:31:11AM +0100, Andreas Ericsson wrote:
---
FNV Hash

I need to fill this in. Search the web for FNV hash. It's faster than my 
hash on Intel (because Intel has fast multiplication), but slower on most 
other platforms. Preliminary tests suggested it has decent distributions. 
---

I believe that under words "my hash", Bob Jenkins meant lookup2, which
was significant slower.

My tests ran on Intel.

Please, could you specify your CPU model.

From /proc/cpuinfo. It's the best I can do without going to our purchase
department and asking for the spec so I can contact the vendor and get
the real thing. Dualcore shouldn't matter for this test, as it isn't
threaded.
Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz

I also noticed I had a few hashes commented out when
doing the test, one of them being Paul Hsie's. For some reason, Jenkin's and
Hsie's didn't perform well for me last time I used the comparison thing (I
did a more thorough job back then, with tests running for several minutes
per hash and table-size, so I commented out the poor candidates).

I expected that Paul Hsieh's hash may not do well on some architecture,
though it seems it did even worse than I expected.

It doesn't do that well on certain types of data, in my experience. It does
have excellent dispersion, so with very long strings it's usually the
best to use, because collisions become so expensive.

I still believe that for this very simple case, the lookup3.c case is not
very practical, as the code is that much more complicated, which was my
main point with posting the comparison.

I would not describe lookup3 as impractical. It is widely used and well
tested. Perhaps, for some Intel CPUs, the difference in speed is not so
big, and FNV hash is much smaller and simpler, so FNV is a reasonable
choice, but the hash is twice slower on my AMD processor and I suspect
it may be even worse on other CPUs, where integer multiplication is slow.
Besides, it may turn out that hashing filename may be not only case where
a fast hash is needed.

Ah well. I think once the patch is in master, it will be easy enough to
test and verify different algorithms. Since it's intended for in-memory
data only, it's no problem to have several algorithms and pick the one
most suitable for the architecture we're compiling for.

--
Andreas Ericsson                   andreas.ericsson@xxxxxx
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html