On Tue, Oct 18, 2022 at 02:37:44PM -0400, Taylor Blau wrote: > After Junio pushed out the tags for v2.38.1 and friends, I noticed that > our linux-leaks CI job is failing t1300.104 and t1300.109, claiming that > there is a leak here: > [...] > #14 0x55ba233acb36 in alias_lookup alias.c:35 > #15 0x55ba232d0748 in handle_alias git.c:346 These are the interesting part of the trace. alias_lookup() returns an allocated string, and then split_cmdline() fails, so we call die() with: fatal: bad alias.split-cmdline-fix string: unclosed quote So yeah, I think it's probably the same issue as discussed previously: the compiler presumably puts alias_string into a register, and then clobbers the register when calling die(), because it knows we're never coming back. > I can't reproduce the failure locally with gcc (Debian 10.3.0-15) > 10.3.0, but Victoria (CC'd) can reproduce the failure with 9.4.0. > Interestingly, the failure only appears when compiling with `-O2`, but > not `-O0` or `-O1`. I can't reproduce on debian unstable using any of gcc 9-12 (but note gcc-9 here is 9.5.0). But then, I had trouble convincing gcc to find _actual_ leaks with lsan. Clang is much more reliable for me, but it turns up only the failure in t1300.135 that I reported earlier. But given the trace above plus the findings on gcc 9.4.0, I feel pretty confident saying this is another instance of the same problem. > I'm not sure if I'm content to treat the 9.4.0 behavior as a compiler > bug, but definitely running the linux-leaks build with `-O2` is > suspicious. It's definitely not a compiler bug. What the optimizer is doing is perfectly reasonable; it's just that the leak checker interacts badly with it. > I suppose we could temporarily mark t1300 as not passing with > SANITIZE=leak turned on, but I tend to agree with Peff that that feels > like a hack working around compiler behavior, that will ultimately > result in us playing whack-a-mole. > > So my preference would be to run the linux-leaks build with `-O0` in its > CFLAGS, optionally with a newer compiler if one is available for Focal. Yes, I still think disabling optimizations is the best path forward. Not just to avoid whack-a-mole, but this is also something we'd eventually need to confront when the code base really is leak-free. I don't think there's any need for a newer compiler. While the optimizer behavior may change between versions, none of what we've seen is any compiler being _wrong_, just different. As a lesser change, I suspect that making NORETURN a noop in leak-checking builds would help in practice, because the compiler wouldn't realize that die() doesn't return. But it's not foolproof (the same thing might trigger with a direct exit() call, or one that becomes direct via inlining). Using -O0 is the more complete fix, and IMHO it's not important to try to get optimal performance during the leak-checking test run. -Peff