Re: [PATCH 14/16] bisect: use a modified breadth-first search to find relevant weights

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 25 Feb 2016 19:09:54 -0800

Stephan Beyer <s-beyer@xxxxxxx> writes:

> The idea is to reverse the DAG and perform a modified breadth-first search
> on it, starting on all sources of the reversed DAG.
> Before each visit of a commit, its weight is induced.
> This only works for non-merge commits, so the BFS stops prematurely on
> merge commits (that are collected in a list).
> Merge commits from that collection are considered for further visits
> as soon as all their parents have been visited.
> Their weights are computed using compute_weight().
> Each BFS walk ends when the computed weight is falling or halfway.
>
> Signed-off-by: Stephan Beyer <s-beyer@xxxxxxx>
> ---
>
> In other words: walk the history up, compute the weight of each
> commit and stop when you know a better commit cannot follow.
>
> This way, the expensive "compute_weight()" function is not called
> on merge commits that will not help to find the maximum distance.

Interesting.  So you walk from the bottom commits, incrementing the
weight while walking on a part of the graph that is single strand of
pearls, or doing the "count the reachable ones the hard way" when
you hit a merge [*1*], but either way, once you know the commit you
are looking at is above the mid-way, i.e. reaches more than half of
the graph, then stop walking because its children will never be
better than that commit?

This is doubly clever idea [*2*].  It cuts the search space in half
for one thing, and more importantly, because "count the reachable
ones the hard way" counting has to traverse the graph from a merge
to the bottom boundary, the computation for a merge commit in the
more recent half of the history is more expensive than the
computation for a merge comit in the older half of the history, and
what you are avoiding is to waste the cycles on that more expensive
half.

I like it.

I haven't studied the code carefully, but your mention of BFS makes
me wonder if you can "narrow" down the search space even more.  In
an earlier step of the series, you introduced a work queue to
replace the recursion.  The recursion forces the algorithm to work
along the topology of the history, but you have more latitude in
selecting the commit to dig further with a work queue.

I wonder if you can sort the elements on the work queue based on
their distance (i.e. by using a priority queue).  You know the total
upfront, so you may find a commit whose weight is exactly half of it
before traversing all the bottom half of the history and it may turn
out to be even more efficient.  After all, you are only looking for
just one such commit, not trying to find all the commits with the
best weight.

[Footnote]

*1* A merge between a commit that reaches N commits and another that
reaches M commits may have weight smaller than the sum of N and M,
and that is why I left the original "truly stupid algorithm" Linus
wrote as-is when I did the obvious optimization for the linear parts
of the history.

*2* Whenever I revisited the bisection in my head, I thought about
ways to improve that "truly stupid" counting, but never thought
about an approach to simply reduce the number of times the "truly
stupid" counting has to be done.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html