Re: [PATCH] Makefile: force -O0 when compiling with SANITIZE=leak

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 18 2022, Jeff King wrote:

Note that we're discussing two different leak here,
git_config_push_env() one in
https://lore.kernel.org/git/Yxl91jfycCo7O7Pp@xxxxxxxxxxxxxxxxxxxxxxx/;
and now another one that originates in the exit() behavior in
git.c. They're both triggered from t1300, but they're otherwise
unrelated.

> On Tue, Oct 18, 2022 at 03:40:45PM -0400, Jeff King wrote:
>
>> > I suppose we could temporarily mark t1300 as not passing with
>> > SANITIZE=leak turned on, but I tend to agree with Peff that that feels
>> > like a hack working around compiler behavior, that will ultimately
>> > result in us playing whack-a-mole.
>> > 
>> > So my preference would be to run the linux-leaks build with `-O0` in its
>> > CFLAGS, optionally with a newer compiler if one is available for Focal.
>> 
>> Yes, I still think disabling optimizations is the best path forward. Not
>> just to avoid whack-a-mole, but this is also something we'd eventually
>> need to confront when the code base really is leak-free.
>
> So here's a patch that does so. I think one of Ævar's reservations in
> the other thread was not wanting to have to switch optimization levels
> manually for various builds. I think the patch below should make things
> Just Work, whether in CI or running a custom build locally.

No, the opposite. I want to try out various combinations of compile
flags & optimization levels and not have SANITIZE=leak second-guess it.

Now if I want to compile with -O2 and SANITIZE=leak I'll need to
monkeypatch the Makefile.

I'd prefer not to need to do that, because while non-O0 is noisy
sometimes, I've also found that it points (at least indirectly) to some
useful edge-cases.

> The compiler may eliminate "str" as a stack variable, and just leave it
> in a register. The register is preserved through most of the function,
> including across the call to some_func(), since we'd eventually need to
> free it. But because die() is marked with NORETURN, the compiler knows
> that it doesn't need to save registers, and just clobbers it.
>
> When die() eventually exits, the leak-checker runs. It looks in
> registers and on the stack for any reference to the memory allocated by
> str (which would indicate that it's not leaked), but can't find one.  So
> it reports it as a leak.
>
> Neither system is wrong, really. The C standard (mostly section 5.1.2.3)
> defines an abstract machine, and compilers are allowed to modify the
> program as long as the observable behavior of that abstract machine is
> unchanged. Looking at random memory values on the stack is undefined
> behavior, and not something that the optimizer needs to support. But
> there really isn't any other way for a leak checker to work; it
> inherently has to do undefined things like scouring memory for pointers.
> So the two things are inherently at odds with each other. We can't fix
> it by changing the code, because from the perspective of the program
> running in an abstract machine, there is no leak.

In the abstract program yes, there is a leak, and you can see that it
leaks with e.g. valgrind regardless of optimization level:

	$ valgrind --leak-check=full --show-leak-kinds=all ./git --config-env=foo.flag= config --bool foo.flag
        [...]
	==6540== 13 bytes in 1 blocks are still reachable in loss record 3 of 17
	==6540==    at 0x48437B4: malloc (vg_replace_malloc.c:381)
	==6540==    by 0x40278B: do_xmalloc (wrapper.c:51)
	==6540==    by 0x402884: do_xmallocz (wrapper.c:85)
	==6540==    by 0x4028C0: xmallocz (wrapper.c:93)
	==6540==    by 0x4028FD: xmemdupz (wrapper.c:109)
	==6540==    by 0x402962: xstrndup (wrapper.c:115)
	==6540==    by 0x342F53: strip_path_suffix (path.c:1300)
	==6540==    by 0x2B4C89: system_prefix (exec-cmd.c:50)
	==6540==    by 0x2B5057: system_path (exec-cmd.c:268)
	==6540==    by 0x2C5E17: git_setup_gettext (gettext.c:109)
	==6540==    by 0x21DC57: main (common-main.c:44)

I think I elaborated on some of this in
https://lore.kernel.org/git/220218.861r00ib86.gmgdl@xxxxxxxxxxxxxxxxxxx/

The tl;dr is that I think you use "leak" in the sense that valgrind
talks about "still reachable" leaks, which is conceptually where
-fsanitize=leak draws the line. I.e. it's not a malloc()/free() tracker,
but tries to see what's "in scope".

This is getting a bit academic, but I don't see how you can both say
that the "compilers are allowed to modify the program as long as the
observable behavior of that abstract machine is unchanged" *and* claim
that e.g. the git_config_push_env() case isn't a real leak.

Because surely the thing that makes it a "leak" by your definition (and
what LSAN strives for) is that it's attributed to a variable that's "in
scope", but the compiler is free to re-arrange all of that.

Anyway, one reason I wanted to punt on that "git --config-env" issue is
because we can entirely avoid the malloc()/free() there. See the "-- >8
--" below, but if we just malloc() after we do our assertions we can
un-confuse clang.

And that seems like a good idea in general, and re whether the "leak" is
gone, at that point valgrind will stop reporting it, so we're really not
leaking at all, not just in the "still reachable" sense.

The reason I mention all of that is to try to define the problem here. I
haven't seen cases where the compilers get it wrong about there being a
leak, it's just that they're mis-categorizing them as "still reachable"
or not, re your "abstract machine" summary.

> This has caused real false positives in the past, like:
>
>   - https://lore.kernel.org/git/patch-v3-5.6-9a44204c4c9-20211022T175227Z-avarab@xxxxxxxxx/
>
>   - https://lore.kernel.org/git/Yy4eo6500C0ijhk+@xxxxxxxxxxxxxxxxxxxxxxx/
>
>   - https://lore.kernel.org/git/Y07yeEQu+C7AH7oN@nand.local/

Now, the other cases in t1300 are from git.c using exit() instead of
retur-ing to our main().

Among numerous other leak fixes I have queued up I have a fix for that,
which fixes the other t1300 cases that have been reported now:
https://github.com/avar/git/tree/avar/git-c-do-not-exit

So as fuzzy as the "abstract machine" can be sometimes, I've found this
useful.

-- >8 --
Subject: config.c: do sanity checks before xmemdupz()

Adjust the code added in 1ff21c05ba9 (config: store "git -c" variables
using more robust format, 2021-01-12) to avoid allocating a "key"
until after we've done the sanity checks on it.

There was no reason to allocate it this early in the function, except
because we were using the "env_name" pointer we were about to
increment, let's save a copy of it instead.

As a result of this early allocation the "clang" at higher
optimization levels would report that we had a leak here. E.g. with
Debian clang 14.0.6-2 with CFLAGS="-O3 -g":

	$ ./git --config-env=foo.flag= config --bool foo.flag
	fatal: missing environment variable name for configuration 'foo.flag'

	=================================================================
	==18018==ERROR: LeakSanitizer: detected memory leaks

	Direct leak of 9 byte(s) in 1 object(s) allocated from:
	    #0 0x55e40aad7cd2 in __interceptor_malloc (/home/avar/g/git/git+0x78cd2) (BuildId: b89fa8f797ccb02ef1148beed300071a5f9b9ab1)
	    #1 0x55e40ad41071 in do_xmalloc /home/avar/g/git/wrapper.c:51:8
	    #2 0x55e40ad41071 in do_xmallocz /home/avar/g/git/wrapper.c:85:8
	    #3 0x55e40ad41071 in xmallocz /home/avar/g/git/wrapper.c:93:9
	    #4 0x55e40ad41071 in xmemdupz /home/avar/g/git/wrapper.c:109:16
	    #5 0x55e40abec960 in git_config_push_env /home/avar/g/git/config.c:521:8
	    #6 0x55e40aadb8b9 in handle_options /home/avar/g/git/git.c:268:4
	    #7 0x55e40aada68d in cmd_main /home/avar/g/git/git.c:896:2
	    #8 0x55e40abae219 in main /home/avar/g/git/common-main.c:57:11
	    #9 0x7fbb5287e209 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
	    #10 0x7fbb5287e2bb in __libc_start_main csu/../csu/libc-start.c:389:3
	    #11 0x55e40aaac130 in _start (/home/avar/g/git/git+0x4d130) (BuildId: b89fa8f797ccb02ef1148beed300071a5f9b9ab1)

	SUMMARY: LeakSanitizer: 9 byte(s) leaked in 1 allocation(s).
	Aborted

This has come up before, with an earlier proposed fix of mine[1] to
sweep this under the rug.

The actual meaningful fix here is that we don't need to do this
allocation at all. The only reason it's needed is because there's no
variant of "sq_quote_buf()" in quote.c that takes a "len"
parameter (but I have one locally).

If we had that we could just pass a "key" and "key_len" to
git_config_push_split_parameter() instead and avoid the allocation
altogether. But in lieu of that better fix (which other API users of
quote.c would benefit from) let's defer the allocation, which happens
to fix the leak reporting on

1. https://lore.kernel.org/git/patch-1.1-fb2f0a660c0-20220908T153252Z-avarab@xxxxxxxxx/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
---
 config.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/config.c b/config.c
index cbb5a3bab74..e49fd87bbd3 100644
--- a/config.c
+++ b/config.c
@@ -512,13 +512,14 @@ void git_config_push_parameter(const char *text)
 void git_config_push_env(const char *spec)
 {
 	char *key;
+	const char *p;
 	const char *env_name;
 	const char *env_value;
 
 	env_name = strrchr(spec, '=');
 	if (!env_name)
 		die(_("invalid config format: %s"), spec);
-	key = xmemdupz(spec, env_name - spec);
+	p = env_name;
 	env_name++;
 	if (!*env_name)
 		die(_("missing environment variable name for configuration '%.*s'"),
@@ -529,6 +530,7 @@ void git_config_push_env(const char *spec)
 		die(_("missing environment variable '%s' for configuration '%.*s'"),
 		    env_name, (int)(env_name - spec - 1), spec);
 
+	key = xmemdupz(spec, p - spec);
 	git_config_push_split_parameter(key, env_value);
 	free(key);
 }
-- 
2.38.0.1093.gcd4a685f0b1





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux