On Wed, Oct 31, 2018 at 5:05 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 10/31/2018 2:04 AM, Elijah Newren wrote: > > On Tue, Oct 30, 2018 at 7:16 AM Derrick Stolee via GitGitGadget > > <gitgitgadget@xxxxxxxxx> wrote: > >> > >> As reported earlier [1], the add_missing_tags() method in remote.c has > >> quadratic performance. Some of that performance is curbed due to the > >> generation-number cutoff in in_merge_bases_many(). However, that fix doesn't > >> help users without a commit-graph, and it can still be painful if that > >> cutoff is sufficiently low compared to the tags we are using for > >> reachability testing. > >> > >> Add a new method in commit-reach.c called get_reachable_subset() which does > >> a many-to-many reachability test. Starting at the 'from' commits, walk until > >> the generation is below the smallest generation in the 'to' commits, or all > >> 'to' commits have been discovered. This performs only one commit walk for > >> the entire add_missing_tags() method, giving linear performance in the worst > >> case. > >> > >> Tests are added in t6600-test-reach.sh to ensure get_reachable_subset() > >> works independently of its application in add_missing_tags(). > > > > On the original repo where the topic was brought up, with commit-graph > > NOT turned on and using origin/master, I see: > > > > $ time git push --dry-run --follow-tags /home/newren/repo-mirror > > To /home/newren/repo-mirror > > * [new branch] test5 -> test5 > > > > real 1m20.081s > > user 1m19.688s > > sys 0m0.292s > > > > Merging this series in, I now get: > > > > $ time git push --dry-run --follow-tags /home/newren/repo-mirror > > To /home/newren/repo-mirror > > * [new branch] test5 -> test5 > > > > real 0m2.857s > > user 0m2.580s > > sys 0m0.328s > > > > which provides a very nice speedup. > > > > Oddly enough, if I _also_ do the following: > > $ git config core.commitgraph true > > $ git config gc.writecommitgraph true > > $ git gc > > > > then my timing actually slows down just slightly: > > $ time git push --follow-tags --dry-run /home/newren/repo-mirror > > To /home/newren/repo-mirror > > * [new branch] test5 -> test5 > > > > real 0m3.027s > > user 0m2.696s > > sys 0m0.400s > > So you say that the commit-graph is off in the 2.8s case, but not here > in the 3.1s case? I would expect _at minimum_ that the cost of parsing > commits would have a speedup in the commit-graph case. There may be > something else going on here, since you are timing a `push` event that > is doing more than the current walk. > > > (run-to-run variation seems pretty consistent, < .1s variation, so > > this difference is just enough to notice.) I wouldn't be that > > surprised if that means there's some really old tags with very small > > generation numbers, meaning it's not gaining anything in this special > > case from the commit-graph, but it does pay the cost of loading the > > commit-graph. > > While you have this test environment, do you mind applying the diff > below and re-running the tests? It will output a count for how many > commits are walked by the algorithm. This should help us determine if > this is another case where generation numbers are worse than commit-date, > or if there is something else going on. Thanks! I can do that, but wouldn't you want a similar patch for the old get_merge_bases_many() in order to compare? Does an absolute number help by itself? It's going to have to be tomorrow, though; not enough time tonight.