Re: [PATCH v2 0/5] New hash table implementation

Fredrik Gustafsson <iveqy@xxxxxxxxx> · Tue, 24 Sep 2013 12:18:32 +0200

On Tue, Sep 24, 2013 at 11:50:16AM +0200, Karsten Blees wrote:
> Regarding performance, I have to admit that the difference between the two implementations is far greater than I had anticipated. The following times (in seconds) are from Linux x64 (Debian Sarge) on a Core i7 860 @2.8GHz. All tests have been run with 1,000 rounds of 100,000 entries each.
> 
> The 'get 10% hits' test does 100,000 lookups on a table with 10,000 entries (i.e. 90% unsuccessful lookups).
> 
> The rows denote different hash functions with different qualities:
> - FNV: FNV-1 hash on stringified loop counter (i.e. fnv1(itoa(i))), as
>   an example of a high quality / low collision hash
> - i: just the loop counter (i.e. 0, 1, 2,...), guaranteed collision free
> - i/10: every 10 entries share the same hash code, lots of collisions
> 
> The i and i/10 tests show that open addressing suffers badly from clustering, i.e. with adjacent hash codes, it degrades to linear search. The *2 versions provide for some space between used buckets to better compare it to the chaining version.
> 
> 
>         |       add        |  get 100% hits  |    get 10% hits
>         |  hash  | hashmap | hash  | hashmap |  hash   | hashmap
> --------+--------+---------+-------+---------+---------+--------
> FNV     | 14.815 |   2.345 | 3.059 |   1.642 |   4.085 |   0.976
> FNV  x2 | 14.409 |   2.706 | 2.888 |   1.959 |   3.905 |   1.393
> i       |  7.432 |   1.593 | 1.364 |   1.142 | 413.023 |   0.589
> i    x2 |  9.169 |   1.866 | 1.427 |   1.163 |   0.757 |   0.670
> i/10    |  1.800 |   1.555 | 5.365 |   6.465 |  32.918 |   1.052
> i/10 x2 |  1.892 |   1.555 | 5.386 |   6.474 |   1.123 |   1.206
> 
> Tests can be reproduced with 'time echo "perfhash[map] <method> 1000" | ./test-hashmap', see test-hashmap.c for definition of method flags.
> 

So I did this improved hash implementation a few months back. Although I
could do a test like this and see an improvement, I failed to see an
improvement in actual git usage.

Hopefully it was just me doing something wrong, but I abandonned the
idea of a better hashmap since I couldn't see any major performance
boost using git and the current implementation is really simple and easy
to maintain.

So my question to you is, does your hashmap speed up git? And does it
speed it up enough to justify that your implementation is the double
amount of code than the current?
-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iveqy@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html