On Tue, Oct 18 2022, Jeff King wrote: Note that we're discussing two different leak here, git_config_push_env() one in https://lore.kernel.org/git/Yxl91jfycCo7O7Pp@xxxxxxxxxxxxxxxxxxxxxxx/; and now another one that originates in the exit() behavior in git.c. They're both triggered from t1300, but they're otherwise unrelated. > On Tue, Oct 18, 2022 at 03:40:45PM -0400, Jeff King wrote: > >> > I suppose we could temporarily mark t1300 as not passing with >> > SANITIZE=leak turned on, but I tend to agree with Peff that that feels >> > like a hack working around compiler behavior, that will ultimately >> > result in us playing whack-a-mole. >> > >> > So my preference would be to run the linux-leaks build with `-O0` in its >> > CFLAGS, optionally with a newer compiler if one is available for Focal. >> >> Yes, I still think disabling optimizations is the best path forward. Not >> just to avoid whack-a-mole, but this is also something we'd eventually >> need to confront when the code base really is leak-free. > > So here's a patch that does so. I think one of Ævar's reservations in > the other thread was not wanting to have to switch optimization levels > manually for various builds. I think the patch below should make things > Just Work, whether in CI or running a custom build locally. No, the opposite. I want to try out various combinations of compile flags & optimization levels and not have SANITIZE=leak second-guess it. Now if I want to compile with -O2 and SANITIZE=leak I'll need to monkeypatch the Makefile. I'd prefer not to need to do that, because while non-O0 is noisy sometimes, I've also found that it points (at least indirectly) to some useful edge-cases. > The compiler may eliminate "str" as a stack variable, and just leave it > in a register. The register is preserved through most of the function, > including across the call to some_func(), since we'd eventually need to > free it. But because die() is marked with NORETURN, the compiler knows > that it doesn't need to save registers, and just clobbers it. > > When die() eventually exits, the leak-checker runs. It looks in > registers and on the stack for any reference to the memory allocated by > str (which would indicate that it's not leaked), but can't find one. So > it reports it as a leak. > > Neither system is wrong, really. The C standard (mostly section 5.1.2.3) > defines an abstract machine, and compilers are allowed to modify the > program as long as the observable behavior of that abstract machine is > unchanged. Looking at random memory values on the stack is undefined > behavior, and not something that the optimizer needs to support. But > there really isn't any other way for a leak checker to work; it > inherently has to do undefined things like scouring memory for pointers. > So the two things are inherently at odds with each other. We can't fix > it by changing the code, because from the perspective of the program > running in an abstract machine, there is no leak. In the abstract program yes, there is a leak, and you can see that it leaks with e.g. valgrind regardless of optimization level: $ valgrind --leak-check=full --show-leak-kinds=all ./git --config-env=foo.flag= config --bool foo.flag [...] ==6540== 13 bytes in 1 blocks are still reachable in loss record 3 of 17 ==6540== at 0x48437B4: malloc (vg_replace_malloc.c:381) ==6540== by 0x40278B: do_xmalloc (wrapper.c:51) ==6540== by 0x402884: do_xmallocz (wrapper.c:85) ==6540== by 0x4028C0: xmallocz (wrapper.c:93) ==6540== by 0x4028FD: xmemdupz (wrapper.c:109) ==6540== by 0x402962: xstrndup (wrapper.c:115) ==6540== by 0x342F53: strip_path_suffix (path.c:1300) ==6540== by 0x2B4C89: system_prefix (exec-cmd.c:50) ==6540== by 0x2B5057: system_path (exec-cmd.c:268) ==6540== by 0x2C5E17: git_setup_gettext (gettext.c:109) ==6540== by 0x21DC57: main (common-main.c:44) I think I elaborated on some of this in https://lore.kernel.org/git/220218.861r00ib86.gmgdl@xxxxxxxxxxxxxxxxxxx/ The tl;dr is that I think you use "leak" in the sense that valgrind talks about "still reachable" leaks, which is conceptually where -fsanitize=leak draws the line. I.e. it's not a malloc()/free() tracker, but tries to see what's "in scope". This is getting a bit academic, but I don't see how you can both say that the "compilers are allowed to modify the program as long as the observable behavior of that abstract machine is unchanged" *and* claim that e.g. the git_config_push_env() case isn't a real leak. Because surely the thing that makes it a "leak" by your definition (and what LSAN strives for) is that it's attributed to a variable that's "in scope", but the compiler is free to re-arrange all of that. Anyway, one reason I wanted to punt on that "git --config-env" issue is because we can entirely avoid the malloc()/free() there. See the "-- >8 --" below, but if we just malloc() after we do our assertions we can un-confuse clang. And that seems like a good idea in general, and re whether the "leak" is gone, at that point valgrind will stop reporting it, so we're really not leaking at all, not just in the "still reachable" sense. The reason I mention all of that is to try to define the problem here. I haven't seen cases where the compilers get it wrong about there being a leak, it's just that they're mis-categorizing them as "still reachable" or not, re your "abstract machine" summary. > This has caused real false positives in the past, like: > > - https://lore.kernel.org/git/patch-v3-5.6-9a44204c4c9-20211022T175227Z-avarab@xxxxxxxxx/ > > - https://lore.kernel.org/git/Yy4eo6500C0ijhk+@xxxxxxxxxxxxxxxxxxxxxxx/ > > - https://lore.kernel.org/git/Y07yeEQu+C7AH7oN@nand.local/ Now, the other cases in t1300 are from git.c using exit() instead of retur-ing to our main(). Among numerous other leak fixes I have queued up I have a fix for that, which fixes the other t1300 cases that have been reported now: https://github.com/avar/git/tree/avar/git-c-do-not-exit So as fuzzy as the "abstract machine" can be sometimes, I've found this useful. -- >8 -- Subject: config.c: do sanity checks before xmemdupz() Adjust the code added in 1ff21c05ba9 (config: store "git -c" variables using more robust format, 2021-01-12) to avoid allocating a "key" until after we've done the sanity checks on it. There was no reason to allocate it this early in the function, except because we were using the "env_name" pointer we were about to increment, let's save a copy of it instead. As a result of this early allocation the "clang" at higher optimization levels would report that we had a leak here. E.g. with Debian clang 14.0.6-2 with CFLAGS="-O3 -g": $ ./git --config-env=foo.flag= config --bool foo.flag fatal: missing environment variable name for configuration 'foo.flag' ================================================================= ==18018==ERROR: LeakSanitizer: detected memory leaks Direct leak of 9 byte(s) in 1 object(s) allocated from: #0 0x55e40aad7cd2 in __interceptor_malloc (/home/avar/g/git/git+0x78cd2) (BuildId: b89fa8f797ccb02ef1148beed300071a5f9b9ab1) #1 0x55e40ad41071 in do_xmalloc /home/avar/g/git/wrapper.c:51:8 #2 0x55e40ad41071 in do_xmallocz /home/avar/g/git/wrapper.c:85:8 #3 0x55e40ad41071 in xmallocz /home/avar/g/git/wrapper.c:93:9 #4 0x55e40ad41071 in xmemdupz /home/avar/g/git/wrapper.c:109:16 #5 0x55e40abec960 in git_config_push_env /home/avar/g/git/config.c:521:8 #6 0x55e40aadb8b9 in handle_options /home/avar/g/git/git.c:268:4 #7 0x55e40aada68d in cmd_main /home/avar/g/git/git.c:896:2 #8 0x55e40abae219 in main /home/avar/g/git/common-main.c:57:11 #9 0x7fbb5287e209 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #10 0x7fbb5287e2bb in __libc_start_main csu/../csu/libc-start.c:389:3 #11 0x55e40aaac130 in _start (/home/avar/g/git/git+0x4d130) (BuildId: b89fa8f797ccb02ef1148beed300071a5f9b9ab1) SUMMARY: LeakSanitizer: 9 byte(s) leaked in 1 allocation(s). Aborted This has come up before, with an earlier proposed fix of mine[1] to sweep this under the rug. The actual meaningful fix here is that we don't need to do this allocation at all. The only reason it's needed is because there's no variant of "sq_quote_buf()" in quote.c that takes a "len" parameter (but I have one locally). If we had that we could just pass a "key" and "key_len" to git_config_push_split_parameter() instead and avoid the allocation altogether. But in lieu of that better fix (which other API users of quote.c would benefit from) let's defer the allocation, which happens to fix the leak reporting on 1. https://lore.kernel.org/git/patch-1.1-fb2f0a660c0-20220908T153252Z-avarab@xxxxxxxxx/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> --- config.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/config.c b/config.c index cbb5a3bab74..e49fd87bbd3 100644 --- a/config.c +++ b/config.c @@ -512,13 +512,14 @@ void git_config_push_parameter(const char *text) void git_config_push_env(const char *spec) { char *key; + const char *p; const char *env_name; const char *env_value; env_name = strrchr(spec, '='); if (!env_name) die(_("invalid config format: %s"), spec); - key = xmemdupz(spec, env_name - spec); + p = env_name; env_name++; if (!*env_name) die(_("missing environment variable name for configuration '%.*s'"), @@ -529,6 +530,7 @@ void git_config_push_env(const char *spec) die(_("missing environment variable '%s' for configuration '%.*s'"), env_name, (int)(env_name - spec - 1), spec); + key = xmemdupz(spec, p - spec); git_config_push_split_parameter(key, env_value); free(key); } -- 2.38.0.1093.gcd4a685f0b1