Stephan Beyer <s-beyer@xxxxxxx> writes: > The idea is to reverse the DAG and perform a modified breadth-first search > on it, starting on all sources of the reversed DAG. > Before each visit of a commit, its weight is induced. > This only works for non-merge commits, so the BFS stops prematurely on > merge commits (that are collected in a list). > Merge commits from that collection are considered for further visits > as soon as all their parents have been visited. > Their weights are computed using compute_weight(). > Each BFS walk ends when the computed weight is falling or halfway. > > Signed-off-by: Stephan Beyer <s-beyer@xxxxxxx> > --- > > In other words: walk the history up, compute the weight of each > commit and stop when you know a better commit cannot follow. > > This way, the expensive "compute_weight()" function is not called > on merge commits that will not help to find the maximum distance. Interesting. So you walk from the bottom commits, incrementing the weight while walking on a part of the graph that is single strand of pearls, or doing the "count the reachable ones the hard way" when you hit a merge [*1*], but either way, once you know the commit you are looking at is above the mid-way, i.e. reaches more than half of the graph, then stop walking because its children will never be better than that commit? This is doubly clever idea [*2*]. It cuts the search space in half for one thing, and more importantly, because "count the reachable ones the hard way" counting has to traverse the graph from a merge to the bottom boundary, the computation for a merge commit in the more recent half of the history is more expensive than the computation for a merge comit in the older half of the history, and what you are avoiding is to waste the cycles on that more expensive half. I like it. I haven't studied the code carefully, but your mention of BFS makes me wonder if you can "narrow" down the search space even more. In an earlier step of the series, you introduced a work queue to replace the recursion. The recursion forces the algorithm to work along the topology of the history, but you have more latitude in selecting the commit to dig further with a work queue. I wonder if you can sort the elements on the work queue based on their distance (i.e. by using a priority queue). You know the total upfront, so you may find a commit whose weight is exactly half of it before traversing all the bottom half of the history and it may turn out to be even more efficient. After all, you are only looking for just one such commit, not trying to find all the commits with the best weight. [Footnote] *1* A merge between a commit that reaches N commits and another that reaches M commits may have weight smaller than the sum of N and M, and that is why I left the original "truly stupid algorithm" Linus wrote as-is when I did the obvious optimization for the linear parts of the history. *2* Whenever I revisited the bisection in my head, I thought about ways to improve that "truly stupid" counting, but never thought about an approach to simply reduce the number of times the "truly stupid" counting has to be done. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html