On Sun, Apr 03 2022, Phillip Wood wrote: > Hi Ævar > > On 02/04/2022 11:49, Ævar Arnfjörð Bjarmason wrote: >> Extend the the release_revisions() function so that it frees the >> "mailmap" in the "struct rev_info". >> The log family of functions now calls the clear_mailmap() function >> added in fa8afd18e5a (revisions API: provide and use a >> release_revisions(), 2021-09-19), allowing us to whitelist some tests >> with "TEST_PASSES_SANITIZE_LEAK=true". >> Unfortunately having a pointer to a mailmap in "struct rev_info" >> instead of an embedded member that we "own" get a bit messy, as can be >> seen in the change to builtin/commit.c. >> When we free() this data we won't be able to tell apart a pointer to >> a >> "mailmap" on the heap from one on the stack. As seen in >> ea57bc0d41b (log: add --use-mailmap option, 2013-01-05) the "log" >> family allocates it on the heap, but in the find_author_by_nickname() >> code added in ea16794e430 (commit: search author pattern against >> mailmap, 2013-08-23) we allocated it on the stack instead. >> Ideally we'd simply change that member to a "struct string_list >> mailmap" and never free() the "mailmap" itself, but that would be a >> much larger change to the revisions API. > > I agree it makes sense to leave that for now > >> We have code that needs to hand an existing "mailmap" to a "struct >> rev_info", while we could change all of that, let's not go there >> now. >> The complexity isn't in the ownership of the "mailmap" per-se, but >> that various things assume a "rev_info.mailmap == NULL" means "doesn't >> want mailmap", if we changed that to an init'd "struct string_list >> we'd need to carefully refactor things to change those assumptions. >> Let's instead always free() it, and simply declare that if you add >> such a "mailmap" it must be allocated on the heap. Any modern libc >> will correctly panic if we free() a stack variable, so this should be >> safe going forward. >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> >> --- >> builtin/commit.c | 5 ++--- >> revision.c | 9 +++++++++ >> t/t0056-git-C.sh | 1 + >> t/t3302-notes-index-expensive.sh | 1 + >> t/t4055-diff-context.sh | 1 + >> t/t4066-diff-emit-delay.sh | 1 + >> t/t7008-filter-branch-null-sha1.sh | 1 + >> 7 files changed, 16 insertions(+), 3 deletions(-) >> diff --git a/builtin/commit.c b/builtin/commit.c >> index c7eda9bbb72..cd6cebcf8c8 100644 >> --- a/builtin/commit.c >> +++ b/builtin/commit.c >> @@ -1100,7 +1100,6 @@ static const char *find_author_by_nickname(const char *name) >> struct rev_info revs; >> struct commit *commit; >> struct strbuf buf = STRBUF_INIT; >> - struct string_list mailmap = STRING_LIST_INIT_NODUP; >> const char *av[20]; >> int ac = 0; >> @@ -1111,7 +1110,8 @@ static const char >> *find_author_by_nickname(const char *name) >> av[++ac] = buf.buf; >> av[++ac] = NULL; >> setup_revisions(ac, av, &revs, NULL); >> - revs.mailmap = &mailmap; >> + revs.mailmap = xmalloc(sizeof(struct string_list)); >> + string_list_init_nodup(revs.mailmap); > > This is a common pattern in one of the previous patches, is it worth > adding helpers to allocate and initialize a struct string_list? Maybe > string_list_new_nodup() and string_list_new_dup(). Maybe, but generally in the git codebase things malloc and then init(), if we're going to add something like this *_new() that would be a change for a lot more APIs than just mailmap. And if it's just for mailmap I don't see how the inconsistency with other code would be worth it. >> read_mailmap(revs.mailmap); >> if (prepare_revision_walk(&revs)) >> @@ -1122,7 +1122,6 @@ static const char *find_author_by_nickname(const char *name) >> ctx.date_mode.type = DATE_NORMAL; >> strbuf_release(&buf); >> format_commit_message(commit, "%aN <%aE>", &buf, &ctx); >> - clear_mailmap(&mailmap); >> release_revisions(&revs); >> return strbuf_detach(&buf, NULL); >> } >> diff --git a/revision.c b/revision.c >> index 553f7de8250..622f0faecc4 100644 >> --- a/revision.c >> +++ b/revision.c >> @@ -2926,10 +2926,19 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s >> return left; >> } >> +static void release_revisions_mailmap(struct string_list >> *mailmap) >> +{ >> + if (!mailmap) >> + return; >> + clear_mailmap(mailmap); >> + free(mailmap); >> +} > > It's not a big issue but if there are no other users of this then it > could just go inside release_revisions, my impression is that this > series builds a collection of very small functions whose only caller > is release_revisions() Yes, these are just trivial static helpers so that each line in release_revisions() corresponds to a member of the struct, without loops, indentation for "don't free this" etc. To the machine code it makes no difference at higher optimization levels.