On Tue, Dec 13 2022, René Scharfe wrote: > Am 13.12.22 um 09:37 schrieb Ævar Arnfjörð Bjarmason: >> >> On Tue, Dec 13 2022, René Scharfe wrote: >> >>> apply_parse_options() passes the array of argument strings to >>> parse_options(), which removes recognized options. The removed strings >>> are not freed, though. >>> >>> Make a copy of the strvec to pass to the function to retain the pointers >>> of its strings, so we release them all at the end. >>> >>> Signed-off-by: René Scharfe <l.s.r@xxxxxx> >>> --- >>> builtin/am.c | 12 +++++++++++- >>> 1 file changed, 11 insertions(+), 1 deletion(-) >>> >>> diff --git a/builtin/am.c b/builtin/am.c >>> index 30c9b3a9cd..dddf1b9af0 100644 >>> --- a/builtin/am.c >>> +++ b/builtin/am.c >>> @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file) >>> int res, opts_left; >>> int force_apply = 0; >>> int options = 0; >>> + const char **apply_argv; >>> >>> if (init_apply_state(&apply_state, the_repository, NULL)) >>> BUG("init_apply_state() failed"); >>> @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file) >>> strvec_push(&apply_opts, "apply"); >>> strvec_pushv(&apply_opts, state->git_apply_opts.v); >>> >>> - opts_left = apply_parse_options(apply_opts.nr, apply_opts.v, >>> + /* >>> + * Build a copy that apply_parse_options() can rearrange. >>> + * apply_opts.v keeps referencing the allocated strings for >>> + * strvec_clear() to release. >>> + */ >>> + ALLOC_ARRAY(apply_argv, apply_opts.nr); >>> + COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr); >>> + >>> + opts_left = apply_parse_options(apply_opts.nr, apply_argv, >>> &apply_state, &force_apply, &options, >>> NULL); >>> >>> @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file) >>> strvec_clear(&apply_paths); >>> strvec_clear(&apply_opts); >>> clear_apply_state(&apply_state); >>> + free(apply_argv); >>> >>> if (res) >>> return res; >> >> I don't mind this going in, but it really feels like a bit of a dirty >> hack. >> >> We have widespread leaks all over the place due to this >> idiom. I.e. parse_options() and a couple of other APIs expect that they >> can munge the "argv", which is fine if it arrives via main(), but not if >> we're editing our own strvecs. > > Where? A quick "git grep 'parse_options.*nr'" turns up only this place > as one that passes a strvec to parse_options. Sorry about the vagueness, this was from memory and I didn't have a full list handy. First, that grep isn't going to give you what you want, since it only catches cases where we're passing the "strvec" to the "parse_options()" directly. But if you look at e.g. how cmd_stash() invokes push_stash() or cmd_describe() invoking cmd_name_rev() you'll see callers where we're potentially losing the strvec due to parse_options(). "Potentially" because in both those cases we call that API, but in the latter case we've got a missing strvec_clear(), so maybe that's all that's needed (it doesn't always munge the argv). >> I think less of a hack is to teach the eventual parse_options() that >> when it munges it it should free() it. I did that for the revisions API >> in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that >> need free()-ing, 2022-08-02). >> >> What do you think? > > Generating string lists and then parsing them is weird. When calls have > to cross a process boundary then we have no choice, but in-process we > shouldn't have to lower our request to an intermediate text format. git > am does it anyway because it writes its options to a file and reads them > back when it resumes with --continue, IIUC. I agree, but making all those use a nicer API would be a much bigger change. > I hope that is and will be the only place that uses parse_options() with > a strvec -- and then we don't have to change that function. > > If this pattern is used more widely then we could package the copying > done by this patch somehow, e.g. by adding a strvec_parse_options() > that wraps the real thing. The reason I'm encouraging you to look at some of the other strvec leaks is to see if you can think of a pattern that fits these various leaky callers. So e.g. a strvec_parse_options() won't work for the ones noted above where we've lost track of it being a "struct strvec" in the first place. I have a local version somewhere where I did (pseudocode): struct strvec vec = STRVEC_INIT; struct string_list to_free = STRING_LIST_INIT_DUP; // populate vec... for (size_t i = 0; i < vec.nr; i++) string_list_append_nodup(&to_free, v[i]): // call parse_options(), or otherwise munge "vec"... free(strvec_detach(&vec)); string_list_clear(&to_free, 0); I.e. you snapshot the members of the "vec" before it's munged, and free() those via a "struct string_list". Then you don't strvec_clear(), but instead free() the container, its members being free'd by the string_list. Here's a (not yet in-tree) patch that uses that: https://lore.kernel.org/git/patch-10.10-81f138e460c-20221017T115544Z-avarab@xxxxxxxxx/ I think that's more reliable & general than the pattern you're adding here. In your version the only reason we're not segfaulting is because the parse_options() consumes all the arguments, but that's not something you can generally rely on with parse_options(). Also, and maybe I'm missing something, but wouldn't this be functionally the same as your implementation: diff --git a/builtin/am.c b/builtin/am.c index 30c9b3a9cd7..e65262cbb21 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -1476,11 +1476,13 @@ static int run_apply(const struct am_state *state, const char *index_file) int res, opts_left; int force_apply = 0; int options = 0; + char *to_free; if (init_apply_state(&apply_state, the_repository, NULL)) BUG("init_apply_state() failed"); strvec_push(&apply_opts, "apply"); + to_free = (char *)apply_opts.v[0]; strvec_pushv(&apply_opts, state->git_apply_opts.v); opts_left = apply_parse_options(apply_opts.nr, apply_opts.v, @@ -1489,6 +1491,7 @@ static int run_apply(const struct am_state *state, const char *index_file) if (opts_left != 0) die("unknown option passed through to git apply"); + free(to_free); if (index_file) { apply_state.index_file = index_file; I.e. we leak the strvec_push() of the "apply" literal, but for the rest we append the "state->git_apply_opts.v", which we already free() elsewhere. So actually, aren't we really just fumbling our way towards the "nodup" interface that the "struct string_list" has, and we should add the same to the "struct strvec". This seems to also fix all the leaks you fixed, but without any of the copying: diff --git a/builtin/am.c b/builtin/am.c index 30c9b3a9cd7..691ec1d152d 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail) static int run_apply(const struct am_state *state, const char *index_file) { struct strvec apply_paths = STRVEC_INIT; - struct strvec apply_opts = STRVEC_INIT; + struct strvec apply_opts = STRVEC_INIT_NODUP; struct apply_state apply_state; int res, opts_left; int force_apply = 0; diff --git a/strvec.c b/strvec.c index 61a76ce6cb9..efdc47a5854 100644 --- a/strvec.c +++ b/strvec.c @@ -22,7 +22,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value) const char *strvec_push(struct strvec *array, const char *value) { - strvec_push_nodup(array, xstrdup(value)); + const char *to_push = array->nodup_strings ? value : xstrdup(value); + + strvec_push_nodup(array, to_push); return array->v[array->nr - 1]; } @@ -31,6 +33,9 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...) va_list ap; struct strbuf v = STRBUF_INIT; + if (array->nodup_strings) + BUG("cannot strvec_pushf() on a 'nodup' strvec"); + va_start(ap, fmt); strbuf_vaddf(&v, fmt, ap); va_end(ap); @@ -67,6 +72,9 @@ void strvec_pop(struct strvec *array) void strvec_split(struct strvec *array, const char *to_split) { + if (array->nodup_strings) + BUG("cannot strvec_split() on a 'nodup' strvec"); + while (isspace(*to_split)) to_split++; for (;;) { @@ -90,7 +98,7 @@ void strvec_clear(struct strvec *array) if (array->v != empty_strvec) { int i; for (i = 0; i < array->nr; i++) - free((char *)array->v[i]); + free(array->nodup_strings ? NULL : (char *)array->v[i]); free(array->v); } strvec_init(array); diff --git a/strvec.h b/strvec.h index 9f55c8766ba..ac6e2c04cea 100644 --- a/strvec.h +++ b/strvec.h @@ -31,12 +31,18 @@ struct strvec { const char **v; size_t nr; size_t alloc; + unsigned int nodup_strings:1; }; #define STRVEC_INIT { \ .v = empty_strvec, \ } +#define STRVEC_INIT_NODUP { \ + .v = empty_strvec, \ + .nodup_strings = 1, \ +} + /** * Initialize an array. This is no different than assigning from * `STRVEC_INIT`. In any case, for this change (and leak fixes in general), could you please also mark the now-passing leak-free tests. This fails after your patch, but passes before. The failure is a "good" one, as they're now leak-free, but should be marked as such: GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_SANITIZE_LEAK_LOG=true \ make SANITIZE=leak test T="t4256-am-format-flowed.sh t0023-crlf-am.sh t4254-am-corrupt.sh t4257-am-interactive.sh t5403-post-checkout-hook.sh t4152-am-subjects.sh" > If we have to change parse_options() at all then I'd prefer it to not > free() anything (to keep it usable with main()'s parameters),... I'm suggesting passing a flag to it to have it free() things it NULL's. Maybe that's a bad idea, but I don't see the incompatibility with main(), for those we just wouldn't pass the flag. It does have the inherent limitation that you couldn't mix strvec and non-strvec parameters, i.e. if some of your elements are dup'd but others aren't you'd need to arrange to have them all dup'd. But I don't think that's an issue in practice, either we pass the fully dup'd version, or main() parameters. > but to > reorder in a non-destructive way. That would mean keeping the NULL > sentinel where it is, and making sure all callers use only the returned > argc to determine which arguments parse_options() didn't recognize. I think this would work if parse_options() wasn't clobbering those elements in some cases, and replacing them with new ones, but doesn't it do that e.g. in parse_options_step()? It also seems like a much more fragile change to try to ensure that nothing uses the "argv" again, but only looks at the updated "argc". Both that & adding a "const" to it would be a huge change across the codebase, so I'd like to avoid them, but at least the "const" method would be helped along by the compiler.