Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Aug 16, 2020 at 2:11 AM Jeff King <peff@xxxxxxxx> wrote:
>
> On Sun, Aug 16, 2020 at 06:59:11AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Digging further, I found that despite the pretty clear documentation
> > near the top of dir.h that folks were supposed to call clear_directory()
> > when the user no longer needed the dir_struct, there were four callers
> > that didn't bother doing that at all.  However, two of them clearly
> > thought about leaks since they had an UNLEAK(dir) directive, which to me
> > suggests that the method to free the data was too unclear.  I suspect
> > the non-obviousness of the API and its holes led folks to avoid it,
> > which then snowballed into further problems with the entries[],
> > ignored[], parent_hashmap, and recursive_hashmap problems.
>
> The UNLEAK() ones are sort-of my fault, and are a combination of:
>
>   - The commit adding them says "in some cases (e.g., dir_struct) we
>     don't even have a function which knows how to free all of the struct
>     members". I'm not sure if why I didn't fix clear_directory() as you
>     did. I may not have known about it, or I may have been lazy. Or more
>     likely I was simply holding the UNLEAK() hammer, so it looked like a
>     nail. ;)
>
>   - My focus was on making leak-checker output cleaner. And it wasn't
>     clear for cases where we're about to exit the program whether we
>     should be actually freeing (which has small but non-zero cost) or
>     just annotating (which is zero-cost, but clearly has confused some
>     people about how UNLEAK() was meant to be used). I think I'm leaning
>     these days towards "free if it is easy to do so".
>
> So this definitely seems like a good direction to me.
>
> > Rename clear_directory() to dir_free() to be more in line with other
> > data structures in git, and introduce a dir_init() to handle the
> > suggested memsetting of dir_struct to all zeroes.  I hope that a name
> > like "dir_free()" is more clear, and that the presence of dir_init()
> > will provide a hint to those looking at the code that there may be a
> > corresponding dir_free() that they need to call.
>
> I think having a pair of matched calls is good. I _thought_ we had
> established a pattern that we should prefer "clear" to "free" for cases
> where the struct itself its not freed. But grepping around, I see there
> are a few exceptions (hashmap_free() is the big one, and then
> oidmap_free() which is built around it seems to have inherited it).
>
> The rest seem to follow that pattern, though: attr_check_free,
> cache_tree_free, and submodule_cache_free all actually free the pointer
> passed in. And "git grep '_clear(' *.h" shows lots of
> clear-but-don't-free examples.
>
> Would dir_clear() make the pairing more obvious, but keep the clear
> name?

Sure, that works for me.  The case where having an actual _free()
makes sense to me -- possibly in addition to a _clear() -- is when
some memory has to be allocated before first use, and thus a
foo_clear() would leave that memory allocated.  Then you really do
need a foo_free() for use when the data structure won't be used again.

> (I also wouldn't be opposed to changing hashmap and oidmap to use the
> name "clear", but that's obviously a separate patch).

hashmap is one of the cases that needs to have a free construct,
because the table in which to stuff the entries has to be allocated
and thus a hashmap_clear() would have to leave the table allocated if
it wants to be ready for re-use.  If someone really is done with a
hashmap, then to avoid leaking, both the entries and the table need to
be deallocated.

I keep getting confused by the hashmap API, and what pieces it frees
-- it looks like my earlier comments today were wrong and
hashmap_free_entries() does free the table.  So...perhaps I should
create a patch to make that clearer, and also submit the patch I've
had for a while to introduce a hashmap_clear() function (which is
similar to hashmap_free_entries, in that it frees the entries and
zeros out most of the map, but it leaves the table allocated and ready
for use).

I really wish hashmap_free() did what hashmap_free_entries() did.  So
annoying and counter-intuitive...

> >  builtin/add.c          |  4 ++--
> >  builtin/check-ignore.c |  4 ++--
> >  builtin/clean.c        |  8 ++++----
> >  builtin/grep.c         |  3 ++-
> >  builtin/ls-files.c     |  4 ++--
> >  builtin/stash.c        |  4 ++--
> >  dir.c                  |  7 ++++++-
> >  dir.h                  | 19 ++++++++++---------
> >  merge.c                |  3 ++-
> >  wt-status.c            |  4 ++--
> >  10 files changed, 34 insertions(+), 26 deletions(-)
>
> That patch itself looks good except for two minor points:
>
> >  /* Frees memory within dir which was allocated.  Does not free dir itself. */
> > -void clear_directory(struct dir_struct *dir)
> > +void dir_free(struct dir_struct *dir)
> >  {
> >       int i, j;
> >       struct exclude_list_group *group;
>
> As I mentioned in my previous email, I think it would be nice if this
> called dir_init() at the end, so that the struct is always in a
> consistent state.
>
> > diff --git a/dir.h b/dir.h
> > index 7d76d0644f..7c55c1a460 100644
> > --- a/dir.h
> > +++ b/dir.h
> > [...]
> > - * - Use `dir.entries[]`.
> > + * - Use `dir.entries[]` and `dir.ignored[]`.
> >   *
> >   * - Call `clear_directory()` when the contained elements are no longer in use.
> >   *
>
> This last line should become dir_free() / dir_clear().

I'll fix these up.

Elijah



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux