On Tue, Nov 08 2022, Phillip Wood wrote: > Hi Ævar > > On 07/11/2022 19:35, Phillip Wood wrote: >>>> @@ -5116,7 +5121,7 @@ static int single_pick(struct repository *r, >>>> TODO_PICK : TODO_REVERT; >>>> item.commit = cmit; >>>> - setenv(GIT_REFLOG_ACTION, action_name(opts), 0); >>>> + opts->reflog_message = sequencer_reflog_action(opts); >>>> return do_pick_commit(r, &item, opts, 0, &check_todo); >>> >>> Here you're adding a new memory leak, which you can see if you run >>> e.g. the 1st test of ./t1013-read-tree-submodule.sh before & after this >>> change. > > What's a read-tree test using rebase for? I find the submodule tests > completely incomprehensible. It is calling > test_submodule_switch_recursing_with_args() which does not call rebase > directly but who knows what is going on in all the helper functions. I don't know, I just worked by way backwards from the leak logs, so... > Have you got a simple example of a test which shows a new leak? ...yes, e.g. (after make SANITIZE=leak): ./t3425-rebase-topology-merges.sh -vixd Will, on "master", emit: Direct leak of 1408 byte(s) in 1 object(s) allocated from: #0 0x7ff891b5f545 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75 #1 0x6c45e8 in do_xmalloc wrapper.c:51 #2 0x6c4670 in xmalloc wrapper.c:72 #3 0x6037e2 in parse_options_concat parse-options-cb.c:188 #4 0x4c547c in run_sequencer builtin/revert.c:140 #5 0x4c5a4c in cmd_revert builtin/revert.c:247 #6 0x407a32 in run_builtin git.c:466 #7 0x407e0a in handle_builtin git.c:721 #8 0x40803d in run_argv git.c:788 #9 0x40850f in cmd_main git.c:923 #10 0x4eed6f in main common-main.c:57 #11 0x7ff8918b9209 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #12 0x7ff8918b92bb in __libc_start_main_impl ../csu/libc-start.c:389 #13 0x405fd0 in _start (git+0x405fd0) Direct leak of 4 byte(s) in 1 object(s) allocated from: #0 0x7ff891b5f545 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75 #1 0x7ff891929caa in __GI___strdup string/strdup.c:42 #2 0x6c4591 in xstrdup wrapper.c:39 #3 0x4c58f8 in run_sequencer builtin/revert.c:223 #4 0x4c5a4c in cmd_revert builtin/revert.c:247 #5 0x407a32 in run_builtin git.c:466 #6 0x407e0a in handle_builtin git.c:721 #7 0x40803d in run_argv git.c:788 #8 0x40850f in cmd_main git.c:923 #9 0x4eed6f in main common-main.c:57 #10 0x7ff8918b9209 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #11 0x7ff8918b92bb in __libc_start_main_impl ../csu/libc-start.c:389 #12 0x405fd0 in _start (git+0x405fd0) After we still have the first leak (which is unrelated), and the second, but have added this one: Direct leak of 7 byte(s) in 1 object(s) allocated from: #0 0x7f7cc51e5545 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75 #1 0x7f7cc4fafcaa in __GI___strdup string/strdup.c:42 #2 0x6c460b in xstrdup wrapper.c:39 #3 0x66df91 in sequencer_reflog_action sequencer.c:3685 #4 0x6725ad in single_pick sequencer.c:5124 #5 0x6728dd in sequencer_pick_revisions sequencer.c:5178 #6 0x4c5a17 in run_sequencer builtin/revert.c:237 #7 0x4c5aa9 in cmd_revert builtin/revert.c:247 #8 0x407a32 in run_builtin git.c:466 #9 0x407e0a in handle_builtin git.c:721 #10 0x40803d in run_argv git.c:788 #11 0x40850f in cmd_main git.c:923 #12 0x4eedcc in main common-main.c:57 #13 0x7f7cc4f3f209 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #14 0x7f7cc4f3f2bb in __libc_start_main_impl ../csu/libc-start.c:389 #15 0x405fd0 in _start (git+0x405fd0) But more to the point, if you run the test suite with e.g.: GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_SANITIZE_LEAK_LOG=true You can find these raw reports in: grep -r sequencer test-results/*.leak Or, from my github.com/avar/git.git use this nice script/alias to summarize it (I haven't upstreamed this yet): $ git help scan-leaks-top 'scan-leaks-top' is aliased to '!f() { cd t && git cat-file blob avar/add-new-sanitize-leak-test-modes-follow-up:t/aggregate-leaks.perl | perl - | less -S; }; f' >> I'm not sure how, opts->reflog_message will be a copy of >> opts->reflog_action which is freed at the end of the rebase. I'll >> have a proper look tomorrow to see if I'm missing something. > > So it is possible this is showing up because I think we only free the > heap allocated members of opts in sequencer_remove_state() and that is > not called when we stop for a conflict resolution, a break command, a > failed exec or a rescheduled pick/reset etc. The way to fix that would > be to refactor sequencer_remove_state() to create > replay_opts_release() and call that from builtin/{revert,rebase}.c Yes, I think that's probably the root cause. I have a leak-fixing topic as a follow-up to my current one, which among other things tried to address this: https://github.com/avar/git/commit/7a150d1b7e2 I'd just forgot about it. That link currently says committed <24hrs ago, but I was just rebasing the topic for something unrelated, I hacked this up in mid-August. > As that is unrelated to removing the setenv() calls which is the focus > of this series I will not be doing that in this series. I'm fine with us leaving this for now, and saying that it's OK that we're adding some new leaks, if we're addressing the setenv/getenv issue, and that we can fix the root cause of the current leaks later. But let's be clear: It's not unrelated to your refactoring in this topic, we didn't have this leak before, and now we have it. These two patches are the cause of some new leaks we didn't have before. And, if we run this on my topic which narrowly attempted to fix these leaks e.g. that "t3425-rebase-topology-merges.sh" test will have just 1 leak in that failing test, v.s. 3 leaks with this topic (the "4 byte(s) in 1 object(s)" above). It's just a nice coincidence that our memory leaks are currently in such a sorry state overall that this isn't failing e.g. the linux-leaks CI, because the new leaks are being masked by tests that area already failing due to other pre-existing leaks. But all that being said I think the right move is for this topic to proceed, perhaps with an updated commit message noting some of this. It's really just running into the existing problem of replay_opts_release(). If that destructor isn't reliable (which it isn't) we can still make new use of it, and then fix how we call it later for all its callers. Which I've just tested: I cherry-pick that 7a150d1b7e2 and the few preceding commits it needs (dcc104aef89..7a150d1b7e2), and then apply this on top: diff --git a/builtin/revert.c b/builtin/revert.c index ee32c714a76..0abd805beed 100644 --- a/builtin/revert.c +++ b/builtin/revert.c @@ -250,6 +250,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix) if (opts.revs) release_revisions(opts.revs); free(opts.revs); + replay_opts_release(&opts); return res; } The leaks above are down to just the unrelated parse_options_concat() leak. I.e. this really is just a case of us missing the destructor due to a more general issue. 1. https://lore.kernel.org/git/8eec228d-d392-523d-2415-149b946f642e@xxxxxxxxxxxxx/