Derrick Stolee <dstolee@xxxxxxxxxxxxx> writes: > When running 'git branch --contains', the in_merge_bases_many() > method calls paint_down_to_common() to discover if a specific > commit is reachable from a set of branches. Commits with lower > generation number are not needed to correctly answer the > containment query of in_merge_bases_many(). > > Add a new parameter, min_generation, to paint_down_to_common() that > prevents walking commits with generation number strictly less than > min_generation. If 0 is given, then there is no functional change. This is thanks to the fact that generation numbers start at zero (as special case, though, with _ZERO), and we use strict inequality to avoid handling _ZERO etc. in a special way. As you wrote in response in previous version of this series, because paint_down_to_common() is file-local, there is no need to come up with symbolic name for GENERATION_NO_CUTOFF case. All right. > > For in_merge_bases_many(), we can pass commit->generation as the > cutoff, and this saves time during 'git branch --contains' queries > that would otherwise walk "around" the commit we are inspecting. All right, and when using paint_down_to_common() to actually find merge bases, and not only check for containment, we cannot use cutoff. Therefore at least one call site needs to run it without functional change... which we can do. Good. > > For a copy of the Linux repository, where HEAD is checked out at > v4.13~100, we get the following performance improvement for > 'git branch --contains' over the previous commit: > > Before: 0.21s > After: 0.13s > Rel %: -38% Nice. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > commit.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) Let me reorder chunks a bit to make it easier to review. > > diff --git a/commit.c b/commit.c > index 7bb007f56a..e2e16ea1a7 100644 > --- a/commit.c > +++ b/commit.c > @@ -1070,7 +1080,7 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit * > if (commit->generation > min_generation) > return ret; > > - bases = paint_down_to_common(commit, nr_reference, reference); > + bases = paint_down_to_common(commit, nr_reference, reference, commit->generation); > if (commit->object.flags & PARENT2) > ret = 1; > clear_commit_marks(commit, all_flags); > @@ -808,11 +808,14 @@ static int queue_has_nonstale(struct prio_queue *queue) > } > > /* all input commits in one and twos[] must have been parsed! */ > -static struct commit_list *paint_down_to_common(struct commit *one, int n, struct commit **twos) > +static struct commit_list *paint_down_to_common(struct commit *one, int n, > + struct commit **twos, > + int min_generation) > { > struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; > struct commit_list *result = NULL; > int i; > + uint32_t last_gen = GENERATION_NUMBER_INFINITY; > > one->object.flags |= PARENT1; > if (!n) { > @@ -831,6 +834,13 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, struc > struct commit_list *parents; > int flags; > > + if (commit->generation > last_gen) > + BUG("bad generation skip"); > + last_gen = commit->generation; Shouldn't we provide more information about where the problem is to the user, to make it easier to debug the repository / commit-graph data? Good to have this sanity check here. > + > + if (commit->generation < min_generation) > + break; So the reasoning for this, as far as I understand, is the following. Please correct me if I am wrong. The callsite with non-zero min_generation, in_merge_bases_many(), tries to find out if "commit" is an ancestor of one of the "references". At least one of "references" is above "commit", so in_merge_bases_many() uses paint_down_to_common() - but is interested only if "commit" was painted as reachable from one of "references". Thus we can interrupt the walk if we know that none of [considered] commits in the queue can reach "commit"/"one" - as if they were all STALE. The search is done using priority queue (a bit like in Dijkstra algorithm), with newer commits - with larger generation numbers - considered first. Thus if current commit has generation number less than min_generation cutoff, i.e. if it is below "commit", then all remaining commits in the queue are below cutoff. Good. > + > flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); > if (flags == (PARENT1 | PARENT2)) { > if (!(commit->object.flags & RESULT)) { > @@ -879,7 +889,7 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co > return NULL; > } > > - list = paint_down_to_common(one, n, twos); > + list = paint_down_to_common(one, n, twos, 0); When calculating merge bases there is no such possibility of an early return due to generation number cutoff. All right then. > > while (list) { > struct commit *commit = pop_commit(&list); > @@ -946,7 +956,7 @@ static int remove_redundant(struct commit **array, int cnt) > filled_index[filled] = j; > work[filled++] = array[j]; > } > - common = paint_down_to_common(array[i], filled, work); > + common = paint_down_to_common(array[i], filled, work, 0); Here we are interested not only if "one"/array[i] is reachable from "twos"/work, but also if "twos" is reachable from "one". Simple cutoff only works in one way, though I wonder if we couldn't use cutoff being minimum generation number of "one" and "twos" together. But that may be left for a separate commit (after checking that the above is correct). Not as simple and obvious as paint_down_to_common() used in in_merge_bases_any(), so it is all right. > if (array[i]->object.flags & PARENT2) > redundant[i] = 1; > for (j = 0; j < filled; j++)