On Tue, Oct 18, 2022 at 11:00:54PM +0200, Ævar Arnfjörð Bjarmason wrote: > Note that we're discussing two different leak here, > git_config_push_env() one in > https://lore.kernel.org/git/Yxl91jfycCo7O7Pp@xxxxxxxxxxxxxxxxxxxxxxx/; > and now another one that originates in the exit() behavior in > git.c. They're both triggered from t1300, but they're otherwise > unrelated. Yes, there are two different cases being discussed. But they _are_ related, in that they are caused by similar compiler optimizations. > > So here's a patch that does so. I think one of Ævar's reservations in > > the other thread was not wanting to have to switch optimization levels > > manually for various builds. I think the patch below should make things > > Just Work, whether in CI or running a custom build locally. > > No, the opposite. I want to try out various combinations of compile > flags & optimization levels and not have SANITIZE=leak second-guess it. > > Now if I want to compile with -O2 and SANITIZE=leak I'll need to > monkeypatch the Makefile. You don't have to patch it. You can do: make CFLAGS='-O2 -fsanitize=leak -DSUPPRESS_ANNOTATED_LEAKS' Yes, that's a mouthful. If you really really want it, we can make the "-O0" behavior conditional. But I remain entirely unconvinced that what you're trying to do is useful. > I'd prefer not to need to do that, because while non-O0 is noisy > sometimes, I've also found that it points (at least indirectly) to some > useful edge-cases. I haven't seen any evidence that leak-checking with -O2 produces anything of value. And I have seen several examples where it produces garbage. I guess you're referring here to the elaboration you give below in your message. But I think you're just wrong about it being useful. > The tl;dr is that I think you use "leak" in the sense that valgrind > talks about "still reachable" leaks, which is conceptually where > -fsanitize=leak draws the line. I.e. it's not a malloc()/free() tracker, > but tries to see what's "in scope". No, these "still reachable" cases are not at all interesting. If we end the program by returning up the stack, then perhaps they could be (but IMHO they are not generally, because it's perfectly reasonable to leave globals pointing to allocated memory at program exit). But if we exit in the middle of a function, and variables are left on the stack, what in the world is the use of a leak-checker complaining about that? It is not hurting anything, because by definition we have exited the program and released the memory. And it is near-impossible to remove these entirely. In the example we're discussing, there's a local variable in git_config_push_env() that _could_ be freed before calling die(). But not in the general case: - what about heap allocations in callers? Imagine a program like this: void foo(const char *str) { if (some_func(str)) die("bad str"); } void bar(void) { char *str = strdup("some string"); foo(str); free(str); } If foo() calls die(), then bar()'s str is a "still reachable" leak. But we can't fix it in any reasonable way. foo() doesn't know about freeing the string. And bar() doesn't know about calling die(). - likewise, we may actually need the heap-allocated string to call die! Consider this: void foo(void) { char *str = strdup("some string"); if (some_func(str)) die("bad str: %s", str); free(str); } You can't avoid a "still reachable" leak here, because die() would need to use the string, then free it, then exit. So if these "still reachable" leaks are not hurting anything, and if they are impossible to get rid of, why do we want to care about them? Doing so just causes pain with no benefit. > This is getting a bit academic, but I don't see how you can both say > that the "compilers are allowed to modify the program as long as the > observable behavior of that abstract machine is unchanged" *and* claim > that e.g. the git_config_push_env() case isn't a real leak. The words "observable behavior" are doing a lot of heavy lifting there. That is the behavior which a conforming C program can observe. And in this case, nothing is violated. We did not reach the free() call, so there was no need to make sure we had the parameters available. The C standard's abstract machine doesn't know about things like "the stack is memory that you can look at". Doing that is undefined behavior, and a program which cares about that is not conforming. I agree this is mostly academic. My point in bringing it up was just to say that no, this isn't a bug in the optimizer. Clobbering values for NORETURN is perfectly reasonable and valid according to the standard. And leak-checking, while basically undefined behavior from the perspective of the standard, is a useful tool. It's the interaction between the two that is ugly, and the most useful thing for us to do is avoid using both at the same time. > Because surely the thing that makes it a "leak" by your definition (and > what LSAN strives for) is that it's attributed to a variable that's "in > scope", but the compiler is free to re-arrange all of that. If you mean "not a leak", then yes: there's still an in-scope variable that points to it, which is what makes it not a leak. > Anyway, one reason I wanted to punt on that "git --config-env" issue is > because we can entirely avoid the malloc()/free() there. See the "-- >8 > --" below, but if we just malloc() after we do our assertions we can > un-confuse clang. > > And that seems like a good idea in general, and re whether the "leak" is > gone, at that point valgrind will stop reporting it, so we're really not > leaking at all, not just in the "still reachable" sense. I'm not convinced it's a good idea. The resulting code requires an extra variable and is (IMHO) slightly harder to understand. And what have we accomplished? We silenced one false positive, but we know there are others. And the linux-leaks CI job is failing on master _right now_, and the patch below doesn't fix that. That makes the job somewhat worthless. And worse, it makes all of CI less useful, because if it says "failed" on every build, people will start to ignore it. > The reason I mention all of that is to try to define the problem here. I > haven't seen cases where the compilers get it wrong about there being a > leak, it's just that they're mis-categorizing them as "still reachable" > or not, re your "abstract machine" summary. This paragraph makes it sound like we are mixing up a real leak and a "still reachable" leak. But it is mixing up "there is no leak and nothing to fix" with "there is a still reachable leak". > Now, the other cases in t1300 are from git.c using exit() instead of > retur-ing to our main(). > > Among numerous other leak fixes I have queued up I have a fix for that, > which fixes the other t1300 cases that have been reported now: > https://github.com/avar/git/tree/avar/git-c-do-not-exit I said it in the last round of discussion, and I'll say it again: this is the wrong direction. These are not real leaks, and we should not be twisting our code to try to avoid them. We should be fixing our leak-checking so that they don't come up. > The actual meaningful fix here is that we don't need to do this > allocation at all. The only reason it's needed is because there's no > variant of "sq_quote_buf()" in quote.c that takes a "len" > parameter (but I have one locally). > > If we had that we could just pass a "key" and "key_len" to > git_config_push_split_parameter() instead and avoid the allocation > altogether. But in lieu of that better fix (which other API users of > quote.c would benefit from) let's defer the allocation, which happens > to fix the leak reporting on So as you probably guessed, I absolutely hate this patch and think we should not take it. If we had a version of sq_quote_buf() that took a "len" parameter, then sure, it would be reasonable to use. But it absolutely is not worth caring about to silence a leak that is a false positive. There is _nothing_ wrong with the code as written. -Peff