Am 17.05.22 um 12:15 schrieb Ævar Arnfjörð Bjarmason: > > On Fri, Apr 22 2022, René Scharfe wrote: > >> Am 21.04.22 um 19:55 schrieb René Scharfe: >>> Am 21.04.22 um 04:11 schrieb Elijah Newren: >>> >>>> Reverting 2d53975488 fixes the problem. >>> >>> That's a good band-aid. >> Or perhaps it's all we need. I can't replicate the original reduction >> of peak memory usage for the Chromium repo anymore. In fact, the very >> next commit, 079f970971 (name-rev: sort tip names before applying, >> 2020-02-05), reduced the number of times free(3) is called there from >> 44245 to 5, and 3656f84278 (name-rev: prefer shorter names over >> following merges, 2021-12-04) brought that number down to zero. >> >> I can't reproduce the issue with the hardenedBSD repo, by the way, but >> e.g. with 'git name-rev 58b82150da' in the Linux repo. >> >> --- >8 --- >> Subject: [PATCH] Revert "name-rev: release unused name strings" >> >> This reverts commit 2d53975488df195e1431c3f90bfb5b60018d5bf6. >> >> 3656f84278 (name-rev: prefer shorter names over following merges, >> 2021-12-04) broke the assumption of 2d53975488 (name-rev: release unused >> name strings, 2020-02-04) that a better name for a child is a better >> name for all of its ancestors as well, because it added a penalty for >> generation > 0. This leads to strings being free(3)'d that are still >> needed. >> >> 079f970971 (name-rev: sort tip names before applying, 2020-02-05) >> already reduced the number of free(3) calls for the use case that >> motivated the original patch (name-rev --all in the Chromium repository) >> from ca. 44000 to 5, and 3656f84278 eliminated even those few. So this >> revert won't affect name-rev's performance on that particular repo. >> >> Reported-by: Thomas Hurst <tom@xxxxxx> >> Helped-by: Elijah Newren <newren@xxxxxxxxx> >> Signed-off-by: René Scharfe <l.s.r@xxxxxx> >> --- >> builtin/name-rev.c | 21 +++++---------------- >> 1 file changed, 5 insertions(+), 16 deletions(-) >> >> diff --git a/builtin/name-rev.c b/builtin/name-rev.c >> index c59b5699fe..02ea9d1633 100644 >> --- a/builtin/name-rev.c >> +++ b/builtin/name-rev.c >> @@ -18,7 +18,7 @@ >> #define CUTOFF_DATE_SLOP 86400 >> >> struct rev_name { >> - char *tip_name; >> + const char *tip_name; >> timestamp_t taggerdate; >> int generation; >> int distance; >> @@ -84,7 +84,7 @@ static int commit_is_before_cutoff(struct commit *commit) >> >> static int is_valid_rev_name(const struct rev_name *name) >> { >> - return name && (name->generation || name->tip_name); >> + return name && name->tip_name; >> } >> >> static struct rev_name *get_commit_rev_name(const struct commit *commit) >> @@ -146,20 +146,9 @@ static struct rev_name *create_or_update_name(struct commit *commit, >> { >> struct rev_name *name = commit_rev_name_at(&rev_names, commit); >> >> - if (is_valid_rev_name(name)) { >> - if (!is_better_name(name, taggerdate, generation, distance, from_tag)) >> - return NULL; >> - >> - /* >> - * This string might still be shared with ancestors >> - * (generation > 0). We can release it here regardless, >> - * because the new name that has just won will be better >> - * for them as well, so name_rev() will replace these >> - * stale pointers when it processes the parents. >> - */ >> - if (!name->generation) >> - free(name->tip_name); >> - } >> + if (is_valid_rev_name(name) && >> + !is_better_name(name, taggerdate, generation, distance, from_tag)) >> + return NULL; >> >> name->taggerdate = taggerdate; >> name->generation = generation; > > I haven't dug into whether it's a false positive, but with this change > GCC's -fanalyzer has started complaining about a potential NULL > dereference: > > builtin/name-rev.c:230:50: error: dereference of NULL ‘name’ [CWE-476] [-Werror=analyzer-null-dereference] > 230 | generation = name->generation + 1; > > This "fixes" it, and passes all tests, but presumably a better fix > involves the callers of get_commit_rev_name() (or that function itself) > deciding if they're OK with NULL here earlier?: > > diff --git a/builtin/name-rev.c b/builtin/name-rev.c > index 02ea9d16330..1d3a620ac72 100644 > --- a/builtin/name-rev.c > +++ b/builtin/name-rev.c > @@ -209,6 +209,7 @@ static void name_rev(struct commit *start_commit, > struct rev_name *name = get_commit_rev_name(commit); > struct commit_list *parents; > int parent_number = 1; > + assert(name); > > parents_to_queue_nr = 0; > > It's a false positive AFAICS because we set the name of the first commit to go into the queue with create_or_update_name(), then go through its parents and set their name as well before putting them into the queue to traverse ancestors recursively. I.e. everything we pull from the queue must have a non-NULL name, right? This begs the question why a free(3) call would make a difference to the analyzer -- it doesn't affect NULL pointers after all. Another good question is whether we should remove the validity check and use commit_rev_name_peek() directly instead of get_commit_rev_name() in the loop since we know it must be valid. This might save a few cycles and perhaps calm down the analyzer. René