On 19/10/22 9:10, Jeff King wrote: > Yes, that's a mouthful. If you really really want it, we can make the > "-O0" behavior conditional. But I remain entirely unconvinced that what > you're trying to do is useful. > >> I'd prefer not to need to do that, because while non-O0 is noisy >> sometimes, I've also found that it points (at least indirectly) to some >> useful edge-cases. > > I haven't seen any evidence that leak-checking with -O2 produces > anything of value. And I have seen several examples where it produces > garbage. I guess you're referring here to the elaboration you give below > in your message. But I think you're just wrong about it being useful. > >> The tl;dr is that I think you use "leak" in the sense that valgrind >> talks about "still reachable" leaks, which is conceptually where >> -fsanitize=leak draws the line. I.e. it's not a malloc()/free() tracker, >> but tries to see what's "in scope". > > No, these "still reachable" cases are not at all interesting. If we end > the program by returning up the stack, then perhaps they could be (but > IMHO they are not generally, because it's perfectly reasonable to leave > globals pointing to allocated memory at program exit). But if we exit in > the middle of a function, and variables are left on the stack, what in > the world is the use of a leak-checker complaining about that? > > It is not hurting anything, because by definition we have exited the > program and released the memory. And it is near-impossible to remove > these entirely. In the example we're discussing, there's a local > variable in git_config_push_env() that _could_ be freed before calling > die(). But not in the general case: > > - what about heap allocations in callers? Imagine a program like this: > > void foo(const char *str) > { > if (some_func(str)) > die("bad str"); > } > void bar(void) > { > char *str = strdup("some string"); > foo(str); > free(str); > } > > If foo() calls die(), then bar()'s str is a "still reachable" leak. > But we can't fix it in any reasonable way. foo() doesn't know about > freeing the string. And bar() doesn't know about calling die(). > > - likewise, we may actually need the heap-allocated string to call > die! Consider this: > > void foo(void) > { > char *str = strdup("some string"); > if (some_func(str)) > die("bad str: %s", str); > free(str); > } > > You can't avoid a "still reachable" leak here, because die() would > need to use the string, then free it, then exit. > > So if these "still reachable" leaks are not hurting anything, and if > they are impossible to get rid of, why do we want to care about them? > Doing so just causes pain with no benefit. > >> This is getting a bit academic, but I don't see how you can both say >> that the "compilers are allowed to modify the program as long as the >> observable behavior of that abstract machine is unchanged" *and* claim >> that e.g. the git_config_push_env() case isn't a real leak. > > The words "observable behavior" are doing a lot of heavy lifting there. > That is the behavior which a conforming C program can observe. And in > this case, nothing is violated. We did not reach the free() call, so > there was no need to make sure we had the parameters available. The C > standard's abstract machine doesn't know about things like "the stack is > memory that you can look at". Doing that is undefined behavior, and a > program which cares about that is not conforming. > > I agree this is mostly academic. My point in bringing it up was just to > say that no, this isn't a bug in the optimizer. Clobbering values for > NORETURN is perfectly reasonable and valid according to the standard. > And leak-checking, while basically undefined behavior from the > perspective of the standard, is a useful tool. It's the interaction > between the two that is ugly, and the most useful thing for us to do is > avoid using both at the same time. > >> Because surely the thing that makes it a "leak" by your definition (and >> what LSAN strives for) is that it's attributed to a variable that's "in >> scope", but the compiler is free to re-arrange all of that. > > If you mean "not a leak", then yes: there's still an in-scope variable > that points to it, which is what makes it not a leak. > >> Anyway, one reason I wanted to punt on that "git --config-env" issue is >> because we can entirely avoid the malloc()/free() there. See the "-- >8 >> --" below, but if we just malloc() after we do our assertions we can >> un-confuse clang. >> >> And that seems like a good idea in general, and re whether the "leak" is >> gone, at that point valgrind will stop reporting it, so we're really not >> leaking at all, not just in the "still reachable" sense. > > I'm not convinced it's a good idea. The resulting code requires an extra > variable and is (IMHO) slightly harder to understand. And what have we > accomplished? We silenced one false positive, but we know there are > others. And the linux-leaks CI job is failing on master _right now_, and > the patch below doesn't fix that. > > That makes the job somewhat worthless. And worse, it makes all of CI > less useful, because if it says "failed" on every build, people will > start to ignore it. > >> The reason I mention all of that is to try to define the problem here. I >> haven't seen cases where the compilers get it wrong about there being a >> leak, it's just that they're mis-categorizing them as "still reachable" >> or not, re your "abstract machine" summary. > > This paragraph makes it sound like we are mixing up a real leak and a > "still reachable" leak. But it is mixing up "there is no leak and > nothing to fix" with "there is a still reachable leak". > >> Now, the other cases in t1300 are from git.c using exit() instead of >> retur-ing to our main(). >> >> Among numerous other leak fixes I have queued up I have a fix for that, >> which fixes the other t1300 cases that have been reported now: >> https://github.com/avar/git/tree/avar/git-c-do-not-exit > > I said it in the last round of discussion, and I'll say it again: this > is the wrong direction. These are not real leaks, and we should not be > twisting our code to try to avoid them. We should be fixing our > leak-checking so that they don't come up. > >> The actual meaningful fix here is that we don't need to do this >> allocation at all. The only reason it's needed is because there's no >> variant of "sq_quote_buf()" in quote.c that takes a "len" >> parameter (but I have one locally). >> >> If we had that we could just pass a "key" and "key_len" to >> git_config_push_split_parameter() instead and avoid the allocation >> altogether. But in lieu of that better fix (which other API users of >> quote.c would benefit from) let's defer the allocation, which happens >> to fix the leak reporting on > > So as you probably guessed, I absolutely hate this patch and think we > should not take it. > > If we had a version of sq_quote_buf() that took a "len" parameter, then > sure, it would be reasonable to use. But it absolutely is not worth > caring about to silence a leak that is a false positive. There is > _nothing_ wrong with the code as written. > > -Peff > "-O2" is what goes public, testing it, directly or indirectly, is useful. Another thing is the pay-off.. We're losing a few eyeballs on "-O2" and some test performance in the way, in exchange of avoiding some false positives. And even with "-O0" the compiler is free to keep playing the whack-a-mole with us. I can easily find motivations for people to switch to "-O0" and do its local tests for example, and so we add eyeballs... but not so easily the other way around, to switch to "-O2". "False positives" makes me think the leak checker does its best. And the compiler. This "-O0 leak" is ignored by both, with "-O2": void func() { malloc(1024); exit(1); } UNLEAK(), "-O0" as a /second opinion/ /confirmation/, some attention to the false positives,... are things that doesn't sound so bad. Maybe there is a better way. Dunno. Un saludo.