Re: [RFC PATCH 1/6] leak fix: cache_put_path

Calvin Wan <calvinwan@xxxxxxxxxx> · Tue, 14 Feb 2023 11:56:50 -0800

On Mon, Feb 13, 2023 at 11:23 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Calvin Wan <calvinwan@xxxxxxxxxx> writes:
>
> > hashmap_put returns a pointer if the key was found and subsequently
> > replaced. Free this pointer so it isn't leaked.
> >
> > Signed-off-by: Calvin Wan <calvinwan@xxxxxxxxxx>
> > ---
> >  submodule-config.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/submodule-config.c b/submodule-config.c
> > index 4dc61b3a78..90cab34568 100644
> > --- a/submodule-config.c
> > +++ b/submodule-config.c
> > @@ -128,9 +128,11 @@ static void cache_put_path(struct submodule_cache *cache,
> >       unsigned int hash = hash_oid_string(&submodule->gitmodules_oid,
> >                                           submodule->path);
> >       struct submodule_entry *e = xmalloc(sizeof(*e));
> > +     struct hashmap_entry *replaced;
> >       hashmap_entry_init(&e->ent, hash);
> >       e->config = submodule;
> > -     hashmap_put(&cache->for_path, &e->ent);
> > +     replaced = hashmap_put(&cache->for_path, &e->ent);
> > +     free(replaced);
> >  }
>
> Out of curiosity, I've checked all the grep hits from hashmap_put()
> in the codebase and this seems to be the only one.  Everybody else
> either calls hashmap_put() only after hashmap_get() sees that there
> is no existing one, or unconditionally calls hashmap_put() and dies
> if an earlier registration is found.
>
> The callers of oidmap_put() in sequencer.c I didn't check.  There
> might be similar leaks there, or they may be safe---I dunno.  But
> all other callers of oidmap_put() also seem to be safe.
>
> Back to the patch itself.  The only caller of this function does
>
>         if (submodule->path) {
>                 cache_remove_path(me->cache, submodule);
>                 free(submodule->path);
>         }
>         submodule->path = xstrdup(value);
>         cache_put_path(me->cache, submodule);
>
> It is curious how the same submodule->path is occupied by more than
> one submodule?  Isn't that a configuration error we want to report
> to the user somehow (not necessarily error/die), instead of silently
> replacing with the "last one wins" precedence?
>
> Assuming that the "last one wins" is the sensible thing to do, the
> change proposed by this patch does seem reasonable way to plug the
> leak.

Swapping this functionality to "first one wins" or erroring out breaks many
tests that are setup improperly. If we continue with the "last one wins"
precedence, then a warning and documentation should be added. We
definitely should not swap it to "first one wins" -- one doesn't make sense
than the other, but "last one wins" at least has precedence. If we choose
to error out during config parsing when duplicated submodule paths are
detected, then those respective tests will also need to be updated.

I'm leaning towards leaving the functionality as is since a user would
have to manually edit the .gitmodules file to get into the state and is
protected from it with `git submodule add`. What do you think about
adding a warning and possibly documentation?