On 3/25/2022 10:25 AM, Ævar Arnfjörð Bjarmason wrote: > In the preceding [1] (pack-objects: move revs out of > get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved > to cmd_pack_objects() so that it unconditionally took place for all > invocations of "git pack-objects". > > We'd thus start leaking memory, which is easily reproduced in > e.g. git.git by feeding e83c5163316 (Initial revision of "git", the > information manager from hell, 2005-04-07) to "git pack-objects"; > > $ echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial > [...] > ==19130==ERROR: LeakSanitizer: detected memory leaks > > Direct leak of 7120 byte(s) in 1 object(s) allocated from: > #0 0x455308 in __interceptor_malloc (/home/avar/g/git/git+0x455308) > #1 0x75b399 in do_xmalloc /home/avar/g/git/wrapper.c:41:8 > #2 0x75b356 in xmalloc /home/avar/g/git/wrapper.c:62:9 > #3 0x5d7609 in prep_parse_options /home/avar/g/git/diff.c:5647:2 > #4 0x5d415a in repo_diff_setup /home/avar/g/git/diff.c:4621:2 > #5 0x6dffbb in repo_init_revisions /home/avar/g/git/revision.c:1853:2 > #6 0x4f599d in cmd_pack_objects /home/avar/g/git/builtin/pack-objects.c:3980:2 > #7 0x4592ca in run_builtin /home/avar/g/git/git.c:465:11 > #8 0x457d81 in handle_builtin /home/avar/g/git/git.c:718:3 > #9 0x458ca5 in run_argv /home/avar/g/git/git.c:785:4 > #10 0x457b40 in cmd_main /home/avar/g/git/git.c:916:19 > #11 0x562259 in main /home/avar/g/git/common-main.c:56:11 > #12 0x7fce792ac7ec in __libc_start_main csu/../csu/libc-start.c:332:16 > #13 0x4300f9 in _start (/home/avar/g/git/git+0x4300f9) > > SUMMARY: LeakSanitizer: 7120 byte(s) leaked in 1 allocation(s). > Aborted > > Narrowly fixing that commit would have been easy, just add call > repo_init_revisions() right before get_object_list(), which is > effectively what was done before that commit. > > But an unstated constraint when setting it up early is that it was > needed for the subsequent [2] (pack-objects: parse --filter directly > into revs.filter, 2022-03-22), i.e. we might have a --filter > command-line option, and need to either have the "struct rev_info" > setup when we encounter that option, or later. > > Let's just change the control flow so that we'll instead set up the > "struct rev_info" only when we need it. Doing so leads to a bit more > verbosity, but it's a lot clearer what we're doing and why. This makes sense. > We could furthermore combine the two get_object_list() invocations > here by having repo_init_revisions() invoked on &pfd.revs, but I think > clearly separating the two makes the flow clearer. Likewise > redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to > "have_revs" early in cmd_pack_objects(). I disagree, especially when you later want to make sure we free the data from revs using your release_revisions(). > This does add the future constraint to opt_parse_list_objects_filter() > that we'll need to adjust this wrapper code if it looks at any other > value of the "struct option" than the "value" member. So we are coupling ourselves to the implementation of this method. > But that regression should be relatively easy to spot. I'm > intentionally not initializing the "struct wrap" with e.g. "{ 0 }" so > that various memory sanity checkers would spot that, we just > initialize the "value" in po_filter_cb(). By doing this e.g. we'll die > on e.g. this test if we were to use another member of "opt" in > opt_parse_list_objects_filter()> So you are using uninitialized memory as a way to discover any necessary changes to that coupling. I'm not sure this is super-safe because we don't necessarily run memory checkers during CI builds. I'd rather have a consistently initialized chunk of data that would behave predictably (and hopefully we'd discover it is behaving incorrectly with that predictable behavior). > While we're at it add parentheses around the arguments to the OPT_* > macros in in list-objects-filter-options.h, as we need to change those > lines anyway. It doesn't matter in this case, but is good general > practice. > > 1. https://lore.kernel.org/git/619b757d98465dbc4995bdc11a5282fbfcbd3daa.1647970119.git.gitgitgadget@xxxxxxxxx > 2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@xxxxxxxxx/ > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> > --- > > This is on top of ds/partial-bundle-more: I thought the fix for this > new leak was involved enough to propose it as a commit-on-top rather > than a fixup for a re-roll, especially since aside from the newly > leaked memory I don't think ds/partial-bundle-more is breaking > anything by doing that. > > Except that is, interacting badly with my release_revisions() series > in "seen", which currently causes the "linux-leaks" job to fail there: > https://lore.kernel.org/git/cover-v2-00.27-00000000000-20220323T203149Z-avarab@xxxxxxxxx/ > > This is proper fix for the issue the interaction with my topic > revealed (not caused, we just started testing for this leak there), > i.e. it obsoletes the suggestion of adding an UNLEAK() there. > > builtin/pack-objects.c | 30 +++++++++++++++++++++++++----- > list-objects-filter-options.h | 8 +++++--- > 2 files changed, 30 insertions(+), 8 deletions(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index d39f668ad56..735080a4a95 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -3857,6 +3857,23 @@ static int option_parse_unpack_unreachable(const struct option *opt, > return 0; > } > > +struct po_filter_data { > + unsigned have_revs:1; > + struct rev_info revs; > +}; > + > +static int po_filter_cb(const struct option *opt, const char *arg, int unset) > +{ > + struct po_filter_data *data = opt->value; > + struct option wrap; /* don't initialize! */ > + > + repo_init_revisions(the_repository, &data->revs, NULL); > + wrap.value = &data->revs.filter; > + data->have_revs = 1; > + > + return opt_parse_list_objects_filter(&wrap, arg, unset); > +} The coupling here is unfortunate, but unavoidable. The future-proof way to do it would be to modify opt->value and pass the rest of its members as-is, but it's marked const so that's not an option. One way to help make this coupling more obvious would be to move this method into list-filter-options.c so we can have their implementations adjacent and even refer to them. Here is a potential version that looks like that: --- >8 --- diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index f02d8df142..55c7131814 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -281,6 +281,10 @@ void parse_list_objects_filter( die("%s", errbuf.buf); } +/* + * If you change this to use any member in 'opt' other than 'value', + * then be sure to update opt_parse_list_objects_filter_in_revs(). + */ int opt_parse_list_objects_filter(const struct option *opt, const char *arg, int unset) { @@ -293,6 +297,18 @@ int opt_parse_list_objects_filter(const struct option *opt, return 0; } +int opt_parse_list_objects_filter_in_revs(const struct option *opt, + const char *arg, int unset) +{ + struct rev_info_maybe_empty *ri = opt->value; + struct option wrap = { .value = &ri->revs.filter }; + + repo_init_revisions(the_repository, &ri->revs, NULL); + ri->has_revs = 1; + + return opt_parse_list_objects_filter(&wrap, arg, unset); +} + const char *list_objects_filter_spec(struct list_objects_filter_options *filter) { if (!filter->filter_spec.nr) diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index 90e4bc9625..cf8b710a76 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -106,6 +106,8 @@ void parse_list_objects_filter( int opt_parse_list_objects_filter(const struct option *opt, const char *arg, int unset); +int opt_parse_list_objects_filter_in_revs(const struct option *opt, + const char *arg, int unset); #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \ OPT_CALLBACK(0, "filter", fo, N_("args"), \ diff --git a/revision.h b/revision.h index 5bc59c7bfe..95aa3ee891 100644 --- a/revision.h +++ b/revision.h @@ -329,6 +329,11 @@ struct rev_info { struct tmp_objdir *remerge_objdir; }; +struct rev_info_maybe_empty { + int has_revs; + struct rev_info revs; +}; + int ref_excluded(struct string_list *, const char *path); void clear_ref_exclusion(struct string_list **); void add_ref_exclusion(struct string_list **, const char *exclude); --- >8 --- > + } else if (pfd.have_revs) { > + get_object_list(&pfd.revs, rp.nr, rp.v); > } else { > + struct rev_info revs; > + > + repo_init_revisions(the_repository, &revs, NULL); > get_object_list(&revs, rp.nr, rp.v); > } Here, I think it would be better to have else { if (!pfd.have_revs) { repo_init_revisions(the_repository, &pfd.revs, NULL); pfd.have_revs = 1; } get_object_list(&pfd.revs, rp.nr, rp.v); } and then later you can add if (pfd.have_revs) release_revisions(&pfd.revs); to clear the memory in exactly one place. Thanks, -Stolee