On Wed, Jul 14, 2021 at 08:57:37PM +0200, Andrzej Hunt wrote: > > @@ -1331,8 +1336,10 @@ static int git_default_core_config(const char *var, const char *value, void *cb) > > if (!strcmp(var, "core.attributesfile")) > > return git_config_pathname(&git_attributes_file, var, value); > > - if (!strcmp(var, "core.hookspath")) > > + if (!strcmp(var, "core.hookspath")) { > > + UNLEAK(git_hooks_path); > > return git_config_pathname(&git_hooks_path, var, value); > > + } > > Why is the UNLEAK necessary here? We generally want to limit use of UNLEAK > to cmd_* functions or direct helpers. git_default_core_config() seems > generic enough that it could be called from anywhere, and using UNLEAK here > means we're potentially masking a real leak? > > IIUC the leak here happens because: > - git_hooks_path is a global variable - hence it's unlikely we'd ever > bother cleaning it up. > - git_default_core_config() gets called a first time with > core.hookspath, and we end up allocating new memory into > git_hooks_path. > - git_default_core_config() gets called again with core.hookspath, > and we overwrite git_hooks_path with a new string which leaks > the string that git_hooks_path used to point to. > > So I think the real fix is to free(git_hooks_path) instead of an UNLEAK? > (Looking at the surrounding code, it looks like the same pattern of leak > might be repeated for other similar globals - is it worth auditing those > while we're here?) This is a common leak pattern in Git. We do something like: static const char *foo = "default"; ... int config_cb(const char *var, const char *value, void *) { if (!strcmp(var, "core.foo")) foo = xstrdup(value); } So we leak if the variable appears twice. But we can't just call "free(foo)" here. In the first call, it's pointing to a string literal! In the case of git_hooks_path, it defaults to NULL, so this works out OK. But it's setting up a trap for somebody later on, who assigns it a default value (and the compiler won't help; it's a "const char *", so the assignment is fine, and the free() would already be casting away the constness). I see a few possible solutions: - instead of strdup'ing long-lived config values, strintern() them. This is really leaking them, but in a way that we hold on to the old values. This is actually more or less what UNLEAK() is doing under the hood (saving a reference to the old buffer, even the variable is overwritten). - find a way to tell when a string comes from the heap versus a literal. I don't think you can do this portably without keeping your own separate flag. We could abstract away some of the pain with a struct like: struct def_string { /* might point to heap memory; const because you must * check flag before modifying */ const char *value; int from_heap; } /* regular static initialization is OK if you don't want a default */ #define DEF_STRING_INIT(str) { .value = str } static void def_string_set(struct def_string *ds, const char *value) { if (ds->from_heap) free(ds->value); ds->value = xstrdup(value); ds->from_heap = 1; } The annoying thing is all of the users need to refer to git_hook_path.value instead of just git_hook_path. If you don't mind a little macro hackery, we could get around that by declaring pairs of variables. Like: #define DEF_STRING_DECLARE(name, value) \ const char *name = value; \ int name##_from_heap #define DEF_STRING_SET(name, value) do { \ if (name##_from_heap) \ free(name); \ name = xstrdup(value); \ name##_from_heap = 1; \ } while(0) I can't say I _love_ any of that, but I think it would work (and probably we'd adapt our helpers like git_config_pathname() to take a def_string. Or I guess just have a def_string_free() which can be called before writing into them). But maybe there's a better solution I'm missing. -Peff