Re: [PATCH] Makefile: force -O0 when compiling with SANITIZE=leak

Rubén Justo <rjusto@xxxxxxxxx> · Thu, 20 Oct 2022 20:57:31 +0200

On 19/10/22 9:10, Jeff King wrote:

> Yes, that's a mouthful. If you really really want it, we can make the
> "-O0" behavior conditional. But I remain entirely unconvinced that what
> you're trying to do is useful.
> 
>> I'd prefer not to need to do that, because while non-O0 is noisy
>> sometimes, I've also found that it points (at least indirectly) to some
>> useful edge-cases.
> 
> I haven't seen any evidence that leak-checking with -O2 produces
> anything of value. And I have seen several examples where it produces
> garbage. I guess you're referring here to the elaboration you give below
> in your message. But I think you're just wrong about it being useful.
> 
>> The tl;dr is that I think you use "leak" in the sense that valgrind
>> talks about "still reachable" leaks, which is conceptually where
>> -fsanitize=leak draws the line. I.e. it's not a malloc()/free() tracker,
>> but tries to see what's "in scope".
> 
> No, these "still reachable" cases are not at all interesting. If we end
> the program by returning up the stack, then perhaps they could be (but
> IMHO they are not generally, because it's perfectly reasonable to leave
> globals pointing to allocated memory at program exit). But if we exit in
> the middle of a function, and variables are left on the stack, what in
> the world is the use of a leak-checker complaining about that?
> 
> It is not hurting anything, because by definition we have exited the
> program and released the memory. And it is near-impossible to remove
> these entirely. In the example we're discussing, there's a local
> variable in git_config_push_env() that _could_ be freed before calling
> die(). But not in the general case:
> 
>   - what about heap allocations in callers? Imagine a program like this:
> 
>       void foo(const char *str)
>       {
>         if (some_func(str))
>           die("bad str");
>       }
>       void bar(void)
>       {
>         char *str = strdup("some string");
> 	foo(str);
> 	free(str);
>       }
> 
>     If foo() calls die(), then bar()'s str is a "still reachable" leak.
>     But we can't fix it in any reasonable way. foo() doesn't know about
>     freeing the string. And bar() doesn't know about calling die().
> 
>   - likewise, we may actually need the heap-allocated string to call
>     die! Consider this:
> 
>       void foo(void)
>       {
>         char *str = strdup("some string");
> 	if (some_func(str))
> 	  die("bad str: %s", str);
> 	free(str);
>       }
> 
>     You can't avoid a "still reachable" leak here, because die() would
>     need to use the string, then free it, then exit.
> 
> So if these "still reachable" leaks are not hurting anything, and if
> they are impossible to get rid of, why do we want to care about them?
> Doing so just causes pain with no benefit.
> 
>> This is getting a bit academic, but I don't see how you can both say
>> that the "compilers are allowed to modify the program as long as the
>> observable behavior of that abstract machine is unchanged" *and* claim
>> that e.g. the git_config_push_env() case isn't a real leak.
> 
> The words "observable behavior" are doing a lot of heavy lifting there.
> That is the behavior which a conforming C program can observe. And in
> this case, nothing is violated. We did not reach the free() call, so
> there was no need to make sure we had the parameters available. The C
> standard's abstract machine doesn't know about things like "the stack is
> memory that you can look at". Doing that is undefined behavior, and a
> program which cares about that is not conforming.
> 
> I agree this is mostly academic. My point in bringing it up was just to
> say that no, this isn't a bug in the optimizer. Clobbering values for
> NORETURN is perfectly reasonable and valid according to the standard.
> And leak-checking, while basically undefined behavior from the
> perspective of the standard, is a useful tool. It's the interaction
> between the two that is ugly, and the most useful thing for us to do is
> avoid using both at the same time.
> 
>> Because surely the thing that makes it a "leak" by your definition (and
>> what LSAN strives for) is that it's attributed to a variable that's "in
>> scope", but the compiler is free to re-arrange all of that.
> 
> If you mean "not a leak", then yes: there's still an in-scope variable
> that points to it, which is what makes it not a leak.
> 
>> Anyway, one reason I wanted to punt on that "git --config-env" issue is
>> because we can entirely avoid the malloc()/free() there. See the "-- >8
>> --" below, but if we just malloc() after we do our assertions we can
>> un-confuse clang.
>>
>> And that seems like a good idea in general, and re whether the "leak" is
>> gone, at that point valgrind will stop reporting it, so we're really not
>> leaking at all, not just in the "still reachable" sense.
> 
> I'm not convinced it's a good idea. The resulting code requires an extra
> variable and is (IMHO) slightly harder to understand. And what have we
> accomplished? We silenced one false positive, but we know there are
> others. And the linux-leaks CI job is failing on master _right now_, and
> the patch below doesn't fix that.
> 
> That makes the job somewhat worthless. And worse, it makes all of CI
> less useful, because if it says "failed" on every build, people will
> start to ignore it.
> 
>> The reason I mention all of that is to try to define the problem here. I
>> haven't seen cases where the compilers get it wrong about there being a
>> leak, it's just that they're mis-categorizing them as "still reachable"
>> or not, re your "abstract machine" summary.
> 
> This paragraph makes it sound like we are mixing up a real leak and a
> "still reachable" leak. But it is mixing up "there is no leak and
> nothing to fix" with "there is a still reachable leak".
> 
>> Now, the other cases in t1300 are from git.c using exit() instead of
>> retur-ing to our main().
>>
>> Among numerous other leak fixes I have queued up I have a fix for that,
>> which fixes the other t1300 cases that have been reported now:
>> https://github.com/avar/git/tree/avar/git-c-do-not-exit
> 
> I said it in the last round of discussion, and I'll say it again: this
> is the wrong direction. These are not real leaks, and we should not be
> twisting our code to try to avoid them. We should be fixing our
> leak-checking so that they don't come up.
> 
>> The actual meaningful fix here is that we don't need to do this
>> allocation at all. The only reason it's needed is because there's no
>> variant of "sq_quote_buf()" in quote.c that takes a "len"
>> parameter (but I have one locally).
>>
>> If we had that we could just pass a "key" and "key_len" to
>> git_config_push_split_parameter() instead and avoid the allocation
>> altogether. But in lieu of that better fix (which other API users of
>> quote.c would benefit from) let's defer the allocation, which happens
>> to fix the leak reporting on
> 
> So as you probably guessed, I absolutely hate this patch and think we
> should not take it.
> 
> If we had a version of sq_quote_buf() that took a "len" parameter, then
> sure, it would be reasonable to use. But it absolutely is not worth
> caring about to silence a leak that is a false positive. There is
> _nothing_ wrong with the code as written.
> 
> -Peff
> 

"-O2" is what goes public, testing it, directly or indirectly, is useful.
Another thing is the pay-off..

We're losing a few eyeballs on "-O2" and some test performance in the way, in
exchange of avoiding some false positives.  And even with "-O0" the compiler is
free to keep playing the whack-a-mole with us.

I can easily find motivations for people to switch to "-O0" and do its local
tests for example, and so we add eyeballs... but not so easily the other way
around, to switch to "-O2".

"False positives" makes me think the leak checker does its best.  And the
compiler.  This "-O0 leak" is ignored by both, with "-O2":

	void func()
	{
		malloc(1024);
		exit(1);
	}

UNLEAK(), "-O0" as a /second opinion/ /confirmation/, some attention to the
false positives,... are things that doesn't sound so bad.  Maybe there is a
better way.  Dunno.

Un saludo.