On Mon, Oct 28, 2013 at 10:04 PM, Vicent Martí <tanoku@xxxxxxxxx> wrote: > > On Mon, Oct 28, 2013 at 8:45 PM, Karsten Blees <karsten.blees@xxxxxxxxx> wrote: > > > Regarding performance, khash uses open addressing, which requires more key comparisons (O(1/(1-load_factor))) than chaining (O(1+load_factor)). However, any measurable differences will most likely be dwarfed by IO costs in this particular use case. > > I don't think this is true. If you actually run a couple insertion and > lookup benchmarks comparing the two implementations, you'll find khash > to be around ~30% faster for most workloads (venturing a guess from > past experience). I am obviously not the author of khash, but I've > found that the theoretical increase in key comparisons is completely > dwarfed by the benefit of increased cache locality during the probing > fase. I still haven't found a faster C hash table implementation than > khash for the general case, that's why I normally use it despite the > worrisome preprocessor crash-party going on in that header file. Yes, cache locality is where open addressing shines, however, you loose that benefit when the keys are pointers (e.g. sha1's). > > > > Btw., pack-objects.c::rehash_objects() in patch 03 unnecessarily checks for duplicates. That's probably the reason for the high hashcmp times you found in the first round of the patch series. > > Patch 03 is a refactoring -- the duplicate checking code has been in > pack-objects.c for years. I am not sure duplicate checking is > superfluous at all, there are many cases where you could be > double-inserting objects in a pack-objects run, and you really don't > want to generate packfiles with dupe objects. The point is that its in _rehash_. Duplicate checking should be in add/put. Rehash only rearranges entries that are alread _in_ the map, and it usually only needs the hash code for that. So pack-objects implements rehash in terms of a full clear + add-all instead, which is of course slower than what khash, hashmap etc. would do. Ciao, Karsten -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html