Re: [PATCH 0/5] Add struct strmap and associated utility functions

Jeff King <peff@xxxxxxxx> · Fri, 28 Aug 2020 03:03:35 -0400

On Fri, Aug 21, 2020 at 02:33:54PM -0700, Elijah Newren wrote:

> However, there's an important difference here between what I've done
> and what you've suggested for hashmap: my method did not deallocate
> hashmap->table in hashmap_clear() and then use lazy initialization.
> In fact, I think not deallocating the table was part of the charm --
> the table had already naturally grown to the right size, and because
> the repository has approximately the same number of paths in various
> commits, this provided me a way of getting a table preallocated to a
> reasonable size for all merges after the first (and there are multiple
> merges either when recursiveness is needed due to multiple merge
> bases, OR when rebasing or cherry-picking a sequence of commits).
> This prevented, as hashmap.h puts it, "expensive resizing".
> 
> So, once again, my performance ideas might be clashing with some of
> your desires for the API.  Any clever ideas for resolving that?

If the magic is in pre-sizing the hash, then it seems like the callers
ought to be feeding the size hint. That does make a little more work for
them, but I think there's real value in having consistent semantics for
"clear" across our data structures.

However, one cheat would be to free the memory but retain the size hint
after a clear. And then if we lazy-init, grow immediately to the hint
size. That's more expensive than a true reuse, because we do reallocate
the memory. But it avoids the repeated re-allocation during growth.

It may also be a sign that we should be growing the hash more
aggressively in the first place. Of course all of this is predicated
having some benchmarks. It would be useful to know which part actually
provided the speedup.

-Peff