On 1/23/08, Andreas Ericsson <ae@xxxxxx> wrote: > Marko Kreen wrote: > > (not lookup3) because lookup3 beat easily all the "simple" hashes > > By how much? FNV beat Linus' hash by 0.01 microseconds / insertion, > and 0.1 microsecons / lookup. We're talking about a case here where > there will never be more lookups than insertions (unless I'm much > mistaken). FNV is around 40% slower than lookup3 on my Intel Core CPU, on 4byte aligned input. See below for more detailed info. > > If you don't mind few percent speed penalty compared to Jenkings > > own optimized version, you can use my simplified version: > > > > http://repo.or.cz/w/pgbouncer.git?a=blob;f=src/hash.c;h=5c9a73639ad098c296c0be562c34573189f3e083;hb=HEAD > > I don't, but I don't care that deeply either. On the one hand, > it would be nifty to have an excellent hash-function in git. > On the other hand, it would look stupid with something that's > quite clearly over-kill. Jenkins hash is fast because it does not look at individual bytes. If you _do_ want to look at them for unrelated reasons, (case-insensitive, unicode-juggling), then it obiously loses the point. That is, if you want to process the string in one go. > > It works always with "native" endianess, unlike Jenkins fixed-endian > > hashlittle() / hashbig(). It may or may not matter if you plan > > to write values on disk. > > > > Speed-wise it may be 10-30% slower worst case (in my case sparc-classic > > with unaligned data), but on x86, lucky gcc version and maybe > > also memcpy() hack seen in system.h, it tends to be ~10% faster, > > especially as it does always 4byte read in main loop. > > It would have to be a significant improvement in wall-clock time > on a test-case of hashing 30k strings to warrant going from 6 to 80 > lines of code, imo. I still believe the original dumb hash Linus > wrote is "good enough". Well, ad-hoc dumb hashes may have horrible worst-cases that you cant see with light testing. Therefore I'd still suggest some better researched dumb hash (eg. FNV or OAT). > On a side-note, it was very interesting reading, and I shall have > to add jenkins3_mkreen() to my test-suite (although the "keep > copyright note" license thing bugs me a bit). Sorry. I just used template boilerplate. Considering all the hard work was done by other people, it not proper to put under my own license. I tagged the file as 'public domain' and pushed out. Btw, the reason I started cleaning lookup3 was that at first I was scared of the complexity of Jenkins code and decided to go with Hsieh hash. Then I found out that Hsieh code is under totally werdo license (http://www.azillionmonkeys.com/qed/weblicense.html) so I could not use it. ==================================================================== Here is my raw-speed test of different hashes. Input is 4-byte aligned which should be common case for malloc()-ed strings. This also is best case for original lookup3(), on unaligned input the memcpy variants beat it easily. Input string length varies randomly in range 0..64. own_memcpy - last 12-byte memcpy() calls out to libc memcpy_hack - last memcpy is inlined bytewise copy loop: while (len--) *dst++ = *src++; Note that is is raw-speed test, if you benchmark larger code the hash difference probably matters less. -------------------------------------------------------------------- Testing: seed=34 align=4 minlen=0 maxlen=64 trycnt=2 duration=10 lookup3 : try=0: ... 247.4880 MB/s lookup3 : try=1: ... 247.6154 MB/s own_memcpy: try=0: ... 223.5508 MB/s own_memcpy: try=1: ... 223.5830 MB/s memcpy_hack: try=0: ... 241.2241 MB/s memcpy_hack: try=1: ... 241.2492 MB/s lookup2 : try=0: ... 190.2697 MB/s lookup2 : try=1: ... 190.3283 MB/s fnv : try=0: ... 153.0318 MB/s fnv : try=1: ... 153.0178 MB/s hsieh : try=0: ... 234.0468 MB/s hsieh : try=1: ... 234.0426 MB/s oat : try=0: ... 154.7804 MB/s oat : try=1: ... 154.8226 MB/s elf : try=0: ... 125.5892 MB/s elf : try=1: ... 125.5734 MB/s Results compared to reference: lookup3 : 100.000 % own_memcpy : 90.311 % memcpy_hack : 97.449 % lookup2 : 76.872 % fnv : 61.815 % hsieh : 94.544 % oat : 62.533 % elf : 50.729 % -- marko - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html