On 15/07/2021 23:42, Jeff King wrote:
On Wed, Jul 14, 2021 at 08:57:37PM +0200, Andrzej Hunt wrote:
@@ -1331,8 +1336,10 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
if (!strcmp(var, "core.attributesfile"))
return git_config_pathname(&git_attributes_file, var, value);
- if (!strcmp(var, "core.hookspath"))
+ if (!strcmp(var, "core.hookspath")) {
+ UNLEAK(git_hooks_path);
return git_config_pathname(&git_hooks_path, var, value);
+ }
Why is the UNLEAK necessary here? We generally want to limit use of UNLEAK
to cmd_* functions or direct helpers. git_default_core_config() seems
generic enough that it could be called from anywhere, and using UNLEAK here
means we're potentially masking a real leak?
IIUC the leak here happens because:
- git_hooks_path is a global variable - hence it's unlikely we'd ever
bother cleaning it up.
- git_default_core_config() gets called a first time with
core.hookspath, and we end up allocating new memory into
git_hooks_path.
- git_default_core_config() gets called again with core.hookspath,
and we overwrite git_hooks_path with a new string which leaks
the string that git_hooks_path used to point to.
So I think the real fix is to free(git_hooks_path) instead of an UNLEAK?
(Looking at the surrounding code, it looks like the same pattern of leak
might be repeated for other similar globals - is it worth auditing those
while we're here?)
This is a common leak pattern in Git. We do something like:
static const char *foo = "default";
...
int config_cb(const char *var, const char *value, void *)
{
if (!strcmp(var, "core.foo"))
foo = xstrdup(value);
}
So we leak if the variable appears twice. But we can't just call
"free(foo)" here. In the first call, it's pointing to a string literal!
In the case of git_hooks_path, it defaults to NULL, so this works out
OK. But it's setting up a trap for somebody later on, who assigns it a
default value (and the compiler won't help; it's a "const char *", so
the assignment is fine, and the free() would already be casting away the
constness).
Ah, right. I didn't think about the risk of future breakages.
I see a few possible solutions:
[...]
I can't say I _love_ any of that, but I think it would work (and
probably we'd adapt our helpers like git_config_pathname() to take a
def_string. Or I guess just have a def_string_free() which can be called
before writing into them).
Is it worth sidestepping the whole globals issue by migrating
core.hookspath (and other string config values) to be fetched via
git_config_get_pathname() and equivalents at the point of use instead?
I looked at the commit below which introduced git_config_get* which
suggests that these methods were indeed intended to be an improvement
over the callback based API, and IIUC switching over should have a bunch
of advantages:
- Removes some potential bugs that can happen if git_config() was never
called with the right callback.
- Potentially reduces the number of times we have to iterate over the
config in the first place (assuming we migrate *all* config access
and not just strings).
- Fewer globals - which reduces potential for such leaks (and probably
makes it easier to read the code in the first place).
OTOH I'm not familiar enough with this code to know what the
disadvantages of such a migration might be (it's definitely going to be
a lot of work... but that's going to apply to any of the approaches we
can choose to fix these leaks).
git_config_get* were introduced in:
3c8687a73e (add `config_set` API for caching config-like files,
2014-07-28)