Re: [PATCH 13/17] rebase: don't leak on "--abort"

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sat, 05 Nov 2022 13:01:42 +0100

On Fri, Nov 04 2022, Phillip Wood wrote:

> On 03/11/2022 17:06, Ævar Arnfjörð Bjarmason wrote:
>> Fix a leak in the recent 6159e7add49 (rebase --abort: improve reflog
>> message, 2022-10-12). Before that commit we'd strbuf_release() the
>> reflog message we were formatting, but when that code was refactored
>> to use "ropts.head_msg" the strbuf_release() was omitted.
>> Ideally the three users of "ropts" in cmd_rebase() should use
>> different "ropts" variables, in practice they're completely separate,
>> as this and the other user in the "switch" statement will "goto
>> cleanup", which won't touch "ropts".
>> The third caller after the "switch" is then unreachable if we take
>> these two branches, so all of them are getting a "{ 0 }" init'd
>> "ropts".
>> So it's OK that we're leaving a stale pointer in "ropts.head_msg",
>> cleaning it up was our responsibility, and it won't be used again.
>> [...]
>> --- a/builtin/rebase.c
>> +++ b/builtin/rebase.c
>> @@ -1320,6 +1320,7 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>>   		if (reset_head(the_repository, &ropts) < 0)
>>   			die(_("could not move back to %s"),
>>   			    oid_to_hex(&options.orig_head->object.oid));
>> +		strbuf_release(&head_msg);
>>   		remove_branch_state(the_repository, 0);
>>   		ret = finish_rebase(&options);
>>   		goto cleanup;
>
> This looks fine. One could argue that the leak is not a "real" leak as
> we're about to exit but the omission of strbuf_release() was 
> unintentional and fix is nice and simple.

What I've been targeting in this and past leak fix topics is roughly
what valgrind calls[1] "definitely lost" and "indirectly lost", and
which -fsanitize=leak calls "that thing I die on whan I run into it" :)

So, you and I know we're "about to exit" as we're in cmd_rebase() and
about to return to common-main.c etc, but it's harder to convince a
compiler of that.

Maybe I'm just touchy, sorry. It just feels like every time I submit a
leak fix topic there's some rehash of the "why do this at all?" or "why
this one?"  discussion.

To summarize (maybe too much for just this reply, but I hope to link
back to this in the future):

 * Historically we've said "that's just a built-in, let the OS take care
   of it!"

 * I agree with that in principle. It's pretty pointless to free()
   things in cmd_*() in the abstract. We're "about to exit", even if
   valgrind etc. call it "definitely lost"[1].

 * But, it would be nice to have git's APIs not leak, so we could call
   them in loops, replace some of our own sub-process invocations
   etc[2].

 * If we had 100% test coverage for those libraries we could just test
   that, and get those leak-free, and ignore the built-ins.

 * We don't have that, and probably never will. E.g. "git log" and
   friends are *the* things that stress revision.[ch]. So to assert that
   the API is leak-free we really need to assert that its users
   are. Ditto for pretty much any other API we carry.

 * Still, we could just say "if something's alloc'd in cmd_*() don't
   care about it". Both LSAN and valgrind support "suppressions" to
   ignore certain functions, patterns etc.

 * But that doesn't really work either, as e.g. "struct rev_info" would
   seem to come from such a cmd_*() as far as ownership goes...

 * ..."but you could have an even more clever suppressions mechanism,
   and only ignore those things that are malloc'd by cmd_*(). But not
   e.g. a malloc in revision.c in a struct member in a struct that's on
   the stack in a cmd_*()", one might say.

   Yeah, you could do that, e.g. in this case we'll call strbuf_addf()
   which makes a beeline to malloc(). Sufficiently clever
   post-processing of leak stacktraces could probably make the
   distinction.

   But then you run into the problem of e.g. how the the log family and
   others use "mailmap". I.e. in that case (and many others) the cmd_*()
   will malloc(), but "hand it off" to the API, whose job we know it is
   to free() it.

   But we only know because of knowledge about the API, an automated
   system would care, still, it's not unworkable. We could go through
   those, list the exceptions etc. etc.

That's a lot of context, but I think brings us up-to-date on this
one-line change.

So, could we do things differently here? Definitely. And if the codebase
was in a state where e.g. builtin/*.c had ~1500 UNLEAK() already I'd
probably try to extend that, rather than fighting that uphill battle.

But that's not the case, we have around ~30 of them, fewer after this
topic. By far the majority of builtin/* is caring about directly
recahable leaks already.

E.g. just above this context in another "case" arm we're doing the exact
"variable on stack, use it, then release()", just with a "struct
string_list".

So I think this is the right direction to go in, both in this patch & in
general.

1. https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks
2. Even the assumption that cmd_*() is a one-off is shaky, e.g. for
   cases where we shell out to "git commit" we're close now to being
   able to just call cmd_commit(), but let's leave that aside for now).

   So, we're pretty close to cmd_*() being a funny API that happens to
   take strvec arguments, which is going to be the shortest path to
   eliminating sub-process invocation in many cases. I.e. as opposed to
   refactoring the cmd_*() into a "real" library, that takes a struct
   with options, doesn't call parse_options() etc.