Am 01.04.20 um 13:16 schrieb Jeff King: > On Wed, Apr 01, 2020 at 06:35:23AM -0400, Jeff King wrote: > >>>> + /* >>>> + * this cast works because we only look at the oid part of the entry, >>>> + * and it comes first in the struct >>>> + */ >>>> + khiter_t pos = kh_get_object_entry_set(&object_table, >>>> + (struct object_entry *)oid); >>> >>> Dirty, but I can believe the comment. >> >> Our hashmap.c implementation gets around this by letting the equality >> function take an extra parameter. It's annoying when you're writing >> those functions, but it should allow this case without any casting (or >> preemptively allocating a struct). > > And here's a patch trying that. Much to my surprise, it outperforms > khash, which has generally been faster in previous tests. > > Here are the numbers I get: > > nr_objects master khash hashmap > 2^20 0m4.317s 0m5.109s 0m3.890s > 2^21 0m10.204s 0m9.702s 0m7.933s > 2^22 0m27.159s 0m17.911s 0m16.751s > 2^23 1m19.038s 0m35.080s 0m31.963s > 2^24 4m18.766s 1m10.233s 1m6.793s > > And I didn't even have to pre-size the table. This really makes me > wonder if there's some silly inefficiency in khash which we could be > addressing. Or maybe open-addressing really does lose to chaining here, > but I think we keep the load factor low enough that it should be a win. Or we're just unlucky. I tried to find the difference between khash with and without presizing using callgrind, but came up empty. It did reveal that fast-import spends 70% of its cycles in a million memset() calls issued (indirectly) by git_deflate_init() which in turn is called by store_object() which is called from parse_and_store_blob(), though. Why is the won second when handling 1M objects not showing in its output? I suspect it's because it uses its custom allocator to gather its data. So I ran the test with jemalloc2 preloaded: nr_objects master khash khash+preload 2^20 0m5.812s 0m5.600s 0m5.604s 2^21 0m12.913s 0m10.884s 0m10.357s 2^22 0m31.257s 0m21.461s 0m21.031s 2^23 1m20.904s 0m40.181s 0m42.607s 2^24 3m59.201s 1m21.104s 1m23.814s My measurements are noisy, but my point is simply that with a different allocator you'd not even have seen any slowdown when switching to khash. > > --- > diff --git a/fast-import.c b/fast-import.c > index 202dda11a6..0ef6defc10 100644 > --- a/fast-import.c > +++ b/fast-import.c > @@ -39,12 +39,28 @@ > > struct object_entry { > struct pack_idx_entry idx; > - struct object_entry *next; > + struct hashmap_entry ent; That uses 16 bytes more memory per entry on x64 than khash would. That's 256MB for 2^24 objects -- not ideal, but bearable, I guess. > uint32_t type : TYPE_BITS, > pack_id : PACK_ID_BITS, > depth : DEPTH_BITS; > }; > > +static int object_entry_hashcmp(const void *map_data, > + const struct hashmap_entry *eptr, > + const struct hashmap_entry *entry_or_key, > + const void *keydata) > +{ > + const struct object_id *oid = keydata; > + const struct object_entry *e1, *e2; > + > + e1 = container_of(eptr, const struct object_entry, ent); That's nicer that the pointer alchemy in the khash conversion for sure. But why const? Can const change the layout of a structure? Scary. René