On Thu, Nov 1, 2018 at 5:32 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: > On 11/1/2018 2:52 AM, Elijah Newren wrote: > > On Wed, Oct 31, 2018 at 5:05 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: > >> On 10/31/2018 2:04 AM, Elijah Newren wrote: > >>> > >>> On the original repo where the topic was brought up, with commit-graph > >>> NOT turned on and using origin/master, I see: > >>> > >>> $ time git push --dry-run --follow-tags /home/newren/repo-mirror > >>> To /home/newren/repo-mirror > >>> * [new branch] test5 -> test5 > >>> > >>> real 1m20.081s > >>> user 1m19.688s > >>> sys 0m0.292s > >>> > >>> Merging this series in, I now get: > >>> > >>> $ time git push --dry-run --follow-tags /home/newren/repo-mirror > >>> To /home/newren/repo-mirror > >>> * [new branch] test5 -> test5 > >>> > >>> real 0m2.857s > >>> user 0m2.580s > >>> sys 0m0.328s > >>> > >>> which provides a very nice speedup. > >>> > >>> Oddly enough, if I _also_ do the following: > >>> $ git config core.commitgraph true > >>> $ git config gc.writecommitgraph true > >>> $ git gc > >>> > >>> then my timing actually slows down just slightly: > >>> $ time git push --follow-tags --dry-run /home/newren/repo-mirror > >>> To /home/newren/repo-mirror > >>> * [new branch] test5 -> test5 > >>> > >>> real 0m3.027s > >>> user 0m2.696s > >>> sys 0m0.400s > >> So you say that the commit-graph is off in the 2.8s case, but not here > >> in the 3.1s case? I would expect _at minimum_ that the cost of parsing > >> commits would have a speedup in the commit-graph case. There may be > >> something else going on here, since you are timing a `push` event that > >> is doing more than the current walk. > >> > >>> (run-to-run variation seems pretty consistent, < .1s variation, so > >>> this difference is just enough to notice.) I wouldn't be that > >>> surprised if that means there's some really old tags with very small > >>> generation numbers, meaning it's not gaining anything in this special > >>> case from the commit-graph, but it does pay the cost of loading the > >>> commit-graph. > >> While you have this test environment, do you mind applying the diff > >> below and re-running the tests? It will output a count for how many > >> commits are walked by the algorithm. This should help us determine if > >> this is another case where generation numbers are worse than commit-date, > >> or if there is something else going on. Thanks! > > I can do that, but wouldn't you want a similar patch for the old > > get_merge_bases_many() in order to compare? Does an absolute number > > help by itself? > > It's going to have to be tomorrow, though; not enough time tonight. > > No rush. I'd just like to understand how removing the commit-graph file > can make the new algorithm faster. Putting a similar count in the old > algorithm would involve giving a count for every call to > in_merge_bases_many(), which would be very noisy. $ time git push --dry-run --follow-tags /home/newren/repo-mirror count: 92912 To /home/newren/repo-mirror * [new branch] test5 -> test5 real 0m3.024s user 0m2.752s sys 0m0.320s Also: $ git rev-list --count HEAD 55764 $ git rev-list --count --all 91820 Seems a little odd to me that count is greater than `git rev-list --count --all`. However, the fact that they are close in magnitude isn't surprising since I went digging for the commit with smallest generation number not found in the upstream repo, and found: $ git ls-remote /home/newren/repo-mirror/ | grep refs/tags/v0.2.0; echo $? 1 $ git rev-list --count refs/tags/v0.2.0 4 $ git rev-list --count refs/tags/v0.2.0 ^HEAD 4 So, the commit-graph can only help us avoid parsing 3 or so commits, but we have to parse the 5M .git/objects/info/commit-graph file, and then for every parse_commit() call we need to bsearch_graph() for the commit. My theory is that parsing the commit-graph file and binary searching it for commits is relatively fast, but that the time is just big enough to measure and notice, while avoiding parsing 3 more commits is a negligible time savings. To me, I'm thinking this is one of those bizarre corner cases where the commit-graph is almost imperceptibly slower than without the commit-graph. (And it is a very weird repo; someone repeatedly filter-branched lots of small independent repos into a monorepo, but didn't always push everything and didn't clean out all old stuff.) But if you still see weird stuff you want to dig into further (maybe the 92912 > 91820 bit?), I'm happy to try out other stuff.