On 10/31/2018 2:04 AM, Elijah Newren wrote: > On Tue, Oct 30, 2018 at 7:16 AM Derrick Stolee via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: >> >> As reported earlier [1], the add_missing_tags() method in remote.c has >> quadratic performance. Some of that performance is curbed due to the >> generation-number cutoff in in_merge_bases_many(). However, that fix doesn't >> help users without a commit-graph, and it can still be painful if that >> cutoff is sufficiently low compared to the tags we are using for >> reachability testing. >> >> Add a new method in commit-reach.c called get_reachable_subset() which does >> a many-to-many reachability test. Starting at the 'from' commits, walk until >> the generation is below the smallest generation in the 'to' commits, or all >> 'to' commits have been discovered. This performs only one commit walk for >> the entire add_missing_tags() method, giving linear performance in the worst >> case. >> >> Tests are added in t6600-test-reach.sh to ensure get_reachable_subset() >> works independently of its application in add_missing_tags(). > > On the original repo where the topic was brought up, with commit-graph > NOT turned on and using origin/master, I see: > > $ time git push --dry-run --follow-tags /home/newren/repo-mirror > To /home/newren/repo-mirror > * [new branch] test5 -> test5 > > real 1m20.081s > user 1m19.688s > sys 0m0.292s > > Merging this series in, I now get: > > $ time git push --dry-run --follow-tags /home/newren/repo-mirror > To /home/newren/repo-mirror > * [new branch] test5 -> test5 > > real 0m2.857s > user 0m2.580s > sys 0m0.328s > > which provides a very nice speedup. > > Oddly enough, if I _also_ do the following: > $ git config core.commitgraph true > $ git config gc.writecommitgraph true > $ git gc > > then my timing actually slows down just slightly: > $ time git push --follow-tags --dry-run /home/newren/repo-mirror > To /home/newren/repo-mirror > * [new branch] test5 -> test5 > > real 0m3.027s > user 0m2.696s > sys 0m0.400s So you say that the commit-graph is off in the 2.8s case, but not here in the 3.1s case? I would expect _at minimum_ that the cost of parsing commits would have a speedup in the commit-graph case. There may be something else going on here, since you are timing a `push` event that is doing more than the current walk. > (run-to-run variation seems pretty consistent, < .1s variation, so > this difference is just enough to notice.) I wouldn't be that > surprised if that means there's some really old tags with very small > generation numbers, meaning it's not gaining anything in this special > case from the commit-graph, but it does pay the cost of loading the > commit-graph. While you have this test environment, do you mind applying the diff below and re-running the tests? It will output a count for how many commits are walked by the algorithm. This should help us determine if this is another case where generation numbers are worse than commit-date, or if there is something else going on. Thanks! -->8-- >From 2115e7dcafb2770455b7b4793f90edc2254bad97 Mon Sep 17 00:00:00 2001 From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Date: Wed, 31 Oct 2018 11:40:50 +0000 Subject: [PATCH] DO-NOT-MERGE: count commits in get_reachable_subset Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> --- commit-reach.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/commit-reach.c b/commit-reach.c index a98532ecc8..b512461cf7 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -700,6 +700,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct commit **from_last = from + nr_from; uint32_t min_generation = GENERATION_NUMBER_INFINITY; int num_to_find = 0; + int count = 0; struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; @@ -729,6 +730,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, while (num_to_find && (current = prio_queue_get(&queue)) != NULL) { struct commit_list *parents; + count++; + if (current->object.flags & PARENT1) { current->object.flags &= ~PARENT1; current->object.flags |= reachable_flag; @@ -755,6 +758,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, clear_commit_marks_many(nr_to, to, PARENT1); clear_commit_marks_many(nr_from, from, PARENT2); + fprintf(stderr, "count: %d\n", count); + return found_commits; } -- 2.19.1.542.gc4df23f792