Re: [PATCH v2 2/4] SANITIZE tests: fix memory leaks in t13config, add to whitelist

Jeff King <peff@xxxxxxxx> · Thu, 15 Jul 2021 17:42:20 -0400

On Wed, Jul 14, 2021 at 08:57:37PM +0200, Andrzej Hunt wrote:

> > @@ -1331,8 +1336,10 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
> >   	if (!strcmp(var, "core.attributesfile"))
> >   		return git_config_pathname(&git_attributes_file, var, value);
> > -	if (!strcmp(var, "core.hookspath"))
> > +	if (!strcmp(var, "core.hookspath")) {
> > +		UNLEAK(git_hooks_path);
> >   		return git_config_pathname(&git_hooks_path, var, value);
> > +	}
> 
> Why is the UNLEAK necessary here? We generally want to limit use of UNLEAK
> to cmd_* functions or direct helpers. git_default_core_config() seems
> generic enough that it could be called from anywhere, and using UNLEAK here
> means we're potentially masking a real leak?
> 
> IIUC the leak here happens because:
> - git_hooks_path is a global variable - hence it's unlikely we'd ever
>   bother cleaning it up.
> - git_default_core_config() gets called a first time with
>   core.hookspath, and we end up allocating new memory into
>   git_hooks_path.
> - git_default_core_config() gets called again with core.hookspath,
>   and we overwrite git_hooks_path with a new string which leaks
>   the string that git_hooks_path used to point to.
> 
> So I think the real fix is to free(git_hooks_path) instead of an UNLEAK?
> (Looking at the surrounding code, it looks like the same pattern of leak
> might be repeated for other similar globals - is it worth auditing those
> while we're here?)

This is a common leak pattern in Git. We do something like:

  static const char *foo = "default";
  ...
  int config_cb(const char *var, const char *value, void *)
  {
          if (!strcmp(var, "core.foo"))
	          foo = xstrdup(value);
  }

So we leak if the variable appears twice. But we can't just call
"free(foo)" here. In the first call, it's pointing to a string literal!

In the case of git_hooks_path, it defaults to NULL, so this works out
OK. But it's setting up a trap for somebody later on, who assigns it a
default value (and the compiler won't help; it's a "const char *", so
the assignment is fine, and the free() would already be casting away the
constness).

I see a few possible solutions:

  - instead of strdup'ing long-lived config values, strintern() them.
    This is really leaking them, but in a way that we hold on to the old
    values. This is actually more or less what UNLEAK() is doing under
    the hood (saving a reference to the old buffer, even the variable is
    overwritten).

  - find a way to tell when a string comes from the heap versus a
    literal. I don't think you can do this portably without keeping your
    own separate flag. We could abstract away some of the pain with a
    struct like:

       struct def_string {
               /* might point to heap memory; const because you must
                * check flag before modifying */
               const char *value;
               int from_heap;
       }

       /* regular static initialization is OK if you don't want a default */
       #define DEF_STRING_INIT(str) { .value = str }

       static void def_string_set(struct def_string *ds, const char *value)
       {
               if (ds->from_heap)
                       free(ds->value);
               ds->value = xstrdup(value);
               ds->from_heap = 1;
       }

    The annoying thing is all of the users need to refer to
    git_hook_path.value instead of just git_hook_path. If you don't mind
    a little macro hackery, we could get around that by declaring pairs
    of variables. Like:

      #define DEF_STRING_DECLARE(name, value) \
      const char *name = value; \
      int name##_from_heap

      #define DEF_STRING_SET(name, value) do { \
              if (name##_from_heap) \
                      free(name); \
              name = xstrdup(value); \
              name##_from_heap = 1; \
      } while(0)

I can't say I _love_ any of that, but I think it would work (and
probably we'd adapt our helpers like git_config_pathname() to take a
def_string. Or I guess just have a def_string_free() which can be called
before writing into them).

But maybe there's a better solution I'm missing.

-Peff

Re: [PATCH v2 2/4] SANITIZE tests: fix memory leaks in t13*config*, add to whitelist

Re: [PATCH v2 2/4] SANITIZE tests: fix memory leaks in t13config, add to whitelist