Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks

Jeff King <peff@xxxxxxxx> · Mon, 17 Aug 2020 15:00:31 -0400

On Mon, Aug 17, 2020 at 10:19:03AM -0700, Elijah Newren wrote:

> > (I also wouldn't be opposed to changing hashmap and oidmap to use the
> > name "clear", but that's obviously a separate patch).
> 
> hashmap is one of the cases that needs to have a free construct,
> because the table in which to stuff the entries has to be allocated
> and thus a hashmap_clear() would have to leave the table allocated if
> it wants to be ready for re-use.  If someone really is done with a
> hashmap, then to avoid leaking, both the entries and the table need to
> be deallocated.

Hmm, you're right. oidmap() will lazy-initialize the table, but hashmap
will not. You _do_ have to initialize a hashmap because somebody has to
set the comparison function. But that could be fixed, and would make it
more like the rest of our code. I.e., you should be able to do:

  struct hashmap foo = HASHMAP_INIT(my_cmpfn);

  if (bar)
	return; /* no leak, because we never put anything in the map! */

  ... add some stuff to the map ...

  hashmap_clear(&foo);

  ... now it's empty and nothing allocated; we could return
      here without a leak or we could add more stuff to it ...

I don't even think it would be that big a change. Just translate a NULL
table to "not found" on the read side, and lazily call alloc_table() on
the write side. And have hashmap_free() not very the _whole_ struct, but
leave the cmpfn in place.

> I keep getting confused by the hashmap API, and what pieces it frees
> -- it looks like my earlier comments today were wrong and
> hashmap_free_entries() does free the table.  So...perhaps I should
> create a patch to make that clearer, and also submit the patch I've
> had for a while to introduce a hashmap_clear() function (which is
> similar to hashmap_free_entries, in that it frees the entries and
> zeros out most of the map, but it leaves the table allocated and ready
> for use).
> 
> I really wish hashmap_free() did what hashmap_free_entries() did.  So
> annoying and counter-intuitive...

I left some comments in my other reply, but in case you do pursue this:
the obvious thing to have is a free_entries boolean parameter to the
function, so that each caller is clear about what they want. And we used
to have that. But it's awkward because "free the entries" isn't a
boolean anymore; it's "free the entries you can find by moving backwards
to this offset from the hashmap_entry pointer". So callers who don't
want to free them have to pass some sentinel value there. And that's how
we ended up with two separate wrapper functions.

I think your main complaint is just about the naming though. If we had:

  /* drop all entries, freeing any hashmap-specific memory */
  hashmap_clear();

  /* ditto, but also free the entries themselves */
  hashmap_clear_and_free_entires();

that would be a bit more obvious (though I imagine it would still be
easy to forget that "clear" doesn't drop the entries). Another approach
would be to have a flag in the map for "do I own the entry memory". Most
callers are happy to hand off ownership of the entries when they're
added. And it may even be that this would open up the possibility of
more convenience functions on the adding/allocation side. I didn't think
it through carefully, though.

-Peff