This series fixes various leaking users of the revisions API. Changes since v2: * I think the rewrite here of fixing the leak should address Jeff King's concerns. I was somewhat mentally stuck on trying to change the code to make the leak fix easier, and then having the leak fix. But I think as the new 3/6 shows fixing the leak first is much easier, and more straightforward to explain. * A new 4/6 then follows-up and rewrites the variable clobbering that was needed for the pre-image of 3/6. * Don't reference the "XXX" comment in 6/6. * A typo fix, pointed out by Eric Sunshine. Thanks all for the review on the v2! Ævar Arnfjörð Bjarmason (6): bisect.c: add missing "goto" for release_revisions() test-fast-rebase helper: use release_revisions() (again) log: fix a memory leak in "git show <revision>..." log: refactor "rev.pending" code in cmd_show() bisect.c: partially fix bisect_rev_setup() memory leak revisions API: don't leak memory on argv elements that need free()-ing bisect.c | 28 ++++++++++------- builtin/log.c | 31 +++++++++++++------ builtin/submodule--helper.c | 5 ++- remote.c | 5 ++- revision.c | 2 ++ revision.h | 3 +- t/helper/test-fast-rebase.c | 2 -- t/t0203-gettext-setlocale-sanity.sh | 1 + t/t1020-subdirectory.sh | 1 + t/t2020-checkout-detach.sh | 1 + t/t3307-notes-man.sh | 1 + t/t3920-crlf-messages.sh | 2 ++ t/t4069-remerge-diff.sh | 1 + t/t7007-show.sh | 1 + ...3-pre-commit-and-pre-merge-commit-hooks.sh | 1 + t/t9122-git-svn-author.sh | 1 - t/t9162-git-svn-dcommit-interactive.sh | 1 - 17 files changed, 59 insertions(+), 28 deletions(-) Range-diff against v2: 1: 12a4a20c59f = 1: 12a4a20c59f bisect.c: add missing "goto" for release_revisions() 2: bbd3a7e5ecc ! 2: a04fade9d9d test-fast-rebase helper: use release_revisions() (again) @@ Commit message ... when != \( I \| &I \) That rule should arguably use only &I, not I (as we might be passed a - pointer). He distinction would matter if anyone cared about the + pointer). The distinction would matter if anyone cared about the side-effects of a memset() followed by release() of a pointer to a variable passed into the function. 4: 54b632c1124 ! 3: 83fc1835fe5 log: fix common "rev.pending" memory leak in "git show" @@ Metadata Author: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> ## Commit message ## - log: fix common "rev.pending" memory leak in "git show" - - Fix a very common memory leak introduced in 5d7eeee2ac6 (git-show: grok blobs, - trees and tags, too, 2006-12-14). - - When "git show" displays commits it needs to temporarily clobbers the - "rev.pending" array, but by doing so we'll fail to - release_revisions(), as we have for most other uses of "struct - rev_info" since 2da81d1efb0 (Merge branch 'ab/plug-leak-in-revisions', - 2022-06-07). - - In the preceding commit this code was made to use a more extendable - pattern, which we can now complete. Once we've clobbered our - "rev.pending" and invoked "cmd_log_walk_no_free()" we need to - "object_array_clear()" our old "rev.pending". - - So in addition to the now-populated &pending array, to avoid leaking - the memory related to the one member array we've created. The - &rev.pending was already being object_array_clear()'d by the - release_revisions() added in f6bfea0ad01 (revisions API users: use - release_revisions() in builtin/log.c, 2022-04-13), now we'll also - correctly free the previous data (f6bfea0ad01 noted this memory leak - as an outstanding TODO). - - This way of doing it is making assumptions about the internals of - "struct rev_info", in particular that the "pending" member is - stand-alone, and that no other state is referring to it. But we've - been making those assumptions ever since 5d7eeee2ac6 (git-show: grok - blobs, trees and tags, too, 2006-12-14). + log: fix a memory leak in "git show <revision>..." + + Fix a memory leak in code added in 5d7eeee2ac6 (git-show: grok blobs, + trees and tags, too, 2006-12-14). As we iterate over a "<revision>..." + command-line and encounter ad OBJ_COMMIT we want to use our "struct + rev_info", but with a "pending" array of one element: the one commit + we're showing in the loop. + + To do this 5d7eeee2ac6 saved away a pointer to rev.pending.objects and + rev.pending.nr for its iteration. We'd then clobber those (and alloc) + when we needed to show an OBJ_COMMIT. + + We'd therefore leak the "rev.pending" we started out with, and only + free the new "rev.pending" in the "OBJ_COMMIT" case arm as + prepare_revision_walk() would draw it down. + + Let's fix this memory leak. Now when we encounter an OBJ_COMMIT we + save away the "rev.pending" before clearing it. We then add a single + commit to it, which our indirect invocation of prepare_revision_walk() + will remove. After that we restore the "rev.pending". + + Our "rev.pending" will then get free'd by the release_revisions() + added in f6bfea0ad01 (revisions API users: use release_revisions() in + builtin/log.c, 2022-04-13) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> ## builtin/log.c ## -@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix) - return cmd_log_deinit(cmd_log_walk(&rev), &rev); - - memcpy(&pending, &rev.pending, sizeof(rev.pending)); -+ memcpy(&rev.pending, &blank, sizeof(rev.pending)); - rev.diffopt.no_free = 1; - for (i = 0; i < pending.nr && !ret; i++) { - struct object *o = pending.objects[i].item; @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix) rev.shown_one = 1; break; case OBJ_COMMIT: -- memcpy(&rev.pending, &blank, sizeof(rev.pending)); ++ { ++ struct object_array old; ++ ++ memcpy(&old, &rev.pending, sizeof(old)); + rev.pending.nr = rev.pending.alloc = 0; + rev.pending.objects = NULL; add_object_array(o, name, &rev.pending); ret = cmd_log_walk_no_free(&rev); ++ memcpy(&rev.pending, &old, sizeof(rev.pending)); break; -@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix) ++ } + default: ret = error(_("unknown type: %d"), o->type); } - } -+ object_array_clear(&pending); - - rev.diffopt.no_free = 0; - diff_free(&rev.diffopt); ## t/t0203-gettext-setlocale-sanity.sh ## @@ 3: 1629299f883 ! 4: fd474666e7c log: make the intent of cmd_show()'s "rev.pending" juggling clearer @@ Metadata Author: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> ## Commit message ## - log: make the intent of cmd_show()'s "rev.pending" juggling clearer + log: refactor "rev.pending" code in cmd_show() - Adjust code added in 5d7eeee2ac6 (git-show: grok blobs, trees and - tags, too, 2006-12-14) to use the "memcpy a &blank" idiom introduced - in 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT - macro, 2021-07-01). + Refactor the juggling of "rev.pending" and our replacement for it + amended in the preceding commit so that: - Now the types in play are guaranteed to correspond, i.e. we used "int" - here for the "count" before, but the corresponding "nr" is an - "unsigned int". By using a "blank" object we almost entirely bypass - that, we'll only need to declare our own "unsigned int i". + * We use an "unsigned int" instead of an "int" for "i", this matches + the types of "struct rev_info" itself. - There are no functional changes here aside from potential overflow - guard rails, the structure only has these three members ("nr", "alloc" - and "objects"), but now we're obviously future-proof against assuming - that. + * We don't need the "count" and "objects" variables introduced in + 5d7eeee2ac6 (git-show: grok blobs, trees and tags, too, 2006-12-14). + + They were originally added since we'd clobber rev.pending in the + loop without restoring it. Since the preceding commit we are + restoring it when we handle OBJ_COMMIT, so the main for-loop can + refer to "rev.pending" didrectly. + + * We use the "memcpy a &blank" idiom introduced in + 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT + macro, 2021-07-01). + + This is more obvious than relying on us enumerating all of the + relevant members of the "struct object_array" that we need to + clear. + + * We comment on why we don't need an object_array_clear() here, see + the analysis in [1]. + + 1. https://lore.kernel.org/git/YuQtJ2DxNKX%2Fy70N@xxxxxxxxxxxxxxxxxxxxxxx/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> + Helped-by: Jeff King <peff@xxxxxxxx> ## builtin/log.c ## @@ builtin/log.c: static void show_setup_revisions_tweak(struct rev_info *rev, @@ builtin/log.c: static void show_setup_revisions_tweak(struct rev_info *rev, { struct rev_info rev; - struct object_array_entry *objects; -+ struct object_array blank = OBJECT_ARRAY_INIT; -+ struct object_array pending = OBJECT_ARRAY_INIT; + unsigned int i; struct setup_revision_opt opt; struct pathspec match_all; @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix) - count = rev.pending.nr; - objects = rev.pending.objects; -+ memcpy(&pending, &rev.pending, sizeof(rev.pending)); rev.diffopt.no_free = 1; - for (i = 0; i < count && !ret; i++) { - struct object *o = objects[i].item; - const char *name = objects[i].name; -+ for (i = 0; i < pending.nr && !ret; i++) { -+ struct object *o = pending.objects[i].item; -+ const char *name = pending.objects[i].name; ++ for (i = 0; i < rev.pending.nr && !ret; i++) { ++ struct object *o = rev.pending.objects[i].item; ++ const char *name = rev.pending.objects[i].name; switch (o->type) { case OBJ_BLOB: ret = show_blob_object(&o->oid, &rev, name); @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix) ret = error(_("could not read object %s"), oid_to_hex(oid)); - objects[i].item = o; -+ pending.objects[i].item = o; ++ rev.pending.objects[i].item = o; i--; break; } @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix) - rev.shown_one = 1; - break; case OBJ_COMMIT: + { + struct object_array old; ++ struct object_array blank = OBJECT_ARRAY_INIT; + + memcpy(&old, &rev.pending, sizeof(old)); - rev.pending.nr = rev.pending.alloc = 0; - rev.pending.objects = NULL; + memcpy(&rev.pending, &blank, sizeof(rev.pending)); ++ add_object_array(o, name, &rev.pending); ret = cmd_log_walk_no_free(&rev); ++ ++ /* ++ * No need for ++ * object_array_clear(&pending). It was ++ * cleared already in prepare_revision_walk() ++ */ + memcpy(&rev.pending, &old, sizeof(rev.pending)); break; + } 5: 2af2bce2d17 = 5: dae5872e722 bisect.c: partially fix bisect_rev_setup() memory leak 6: 3c57e126554 ! 6: 5e637f55ff1 revisions API: don't leak memory on argv elements that need free()-ing @@ Commit message revisions API: don't leak memory on argv elements that need free()-ing Add a "free_removed_argv_elements" member to "struct - setup_revision_opt", and use it to fix several memory leaks, e.g. the - one with a "XXX" comment added in 8a534b61241 (bisect: use argv_array - API, 2011-09-13). + setup_revision_opt", and use it to fix several memory leaks. We have various memory leaks in APIs that take and munge "const char **argv", e.g. parse_options(). Sometimes these APIs are given the -- 2.37.1.1233.ge8b09efaedc