On Fri, Aug 21, 2020 at 02:33:54PM -0700, Elijah Newren wrote: > However, there's an important difference here between what I've done > and what you've suggested for hashmap: my method did not deallocate > hashmap->table in hashmap_clear() and then use lazy initialization. > In fact, I think not deallocating the table was part of the charm -- > the table had already naturally grown to the right size, and because > the repository has approximately the same number of paths in various > commits, this provided me a way of getting a table preallocated to a > reasonable size for all merges after the first (and there are multiple > merges either when recursiveness is needed due to multiple merge > bases, OR when rebasing or cherry-picking a sequence of commits). > This prevented, as hashmap.h puts it, "expensive resizing". > > So, once again, my performance ideas might be clashing with some of > your desires for the API. Any clever ideas for resolving that? If the magic is in pre-sizing the hash, then it seems like the callers ought to be feeding the size hint. That does make a little more work for them, but I think there's real value in having consistent semantics for "clear" across our data structures. However, one cheat would be to free the memory but retain the size hint after a clear. And then if we lazy-init, grow immediately to the hint size. That's more expensive than a true reuse, because we do reallocate the memory. But it avoids the repeated re-allocation during growth. It may also be a sign that we should be growing the hash more aggressively in the first place. Of course all of this is predicated having some benchmarks. It would be useful to know which part actually provided the speedup. -Peff