On Fri, Oct 30, 2020 at 6:41 AM Jeff King <peff@xxxxxxxx> wrote: > > On Tue, Oct 13, 2020 at 12:40:44AM +0000, Elijah Newren via GitGitGadget wrote: > > > merge-ort is a heavy user of strmaps, which are built on hashmap.[ch]. > > reset_maps() in merge-ort was taking about 12% of overall runtime in my > > testcase involving rebasing 35 patches of linux.git across a big rename. > > reset_maps() was calling hashmap_free() followed by hashmap_init(), > > meaning that not only was it freeing all the memory associated with each > > of the strmaps just to immediately allocate a new array again, it was > > allocating a new array that wasy likely smaller than needed (thus > > s/wasy/was/ Thanks; will fix. > > resulting in later need to rehash things). The ending size of the map > > table on the previous commit was likely almost perfectly sized for the > > next commit we wanted to pick, and not dropping and reallocating the > > table immediately is a win. > > > > Add some new API to hashmap to clear a hashmap of entries without > > freeing map->table (and instead only zeroing it out like alloc_table() > > would do, along with zeroing the count of items in the table and the > > shrink_at field). > > This seems like a reasonable optimization to make, and doesn't make the > API significantly more complicated. I'd expect the allocation of actual > entry objects to dwarf the table allocation, but I guess: > > - you'll deal with the individual entries later using a mempool > > - it's not just the allocation, but the re-insertion of the entries as > we grow > > It would be nice if we had some actual perf numbers to report here, so > we could know exactly how much it was buying us. But I guess things are > a bit out-of-order there. You want to do this series first and then > build merge-ort on top as a user. We could introduce the basic data > structure first, then merge-ort, and then start applying optimizations > with real-world measurements. But I'm not sure it's worth the amount of > time you'd have to spend to reorganize in that way. Yeah, the perf benefits didn't really come until I added a strmap_clear() based on this, so as you discovered I put perf numbers in patch 7 of this series. Should I add a mention of the later commit message at this point in the series? > > hashmap.c | 39 +++++++++++++++++++++++++++------------ > > hashmap.h | 13 ++++++++++++- > > The implementation itself looks correct to me. I already mentioned my > thoughts on naming in patch 1. I'll circle back to that when I comment on patch 1...