Jeff King <peff@xxxxxxxx> writes: > Both of those are trading a bit of accuracy in finding the exact > midpoint in the early steps. It's perhaps another possible option for > git-bisect itself: if we see a very large number of commits, we could > try to approximate rather than finding the exact answer. Another thing the user may (but "bisect" itself cannot) try is to use a path-limited bisection (that is, if you know your breakage is inside one subsytem, you only check commits that touch the area). > In most > histories I'd expect that taking the midpoint of a linearized topo-order > would get you a pretty reasonable outcome. E.g.: > > total=$(git rev-list --count v3.0..v6.13-rc7) > git rev-list --topo-order v3.0..v6.13-rc7 | > tail -n +$((total / 2)) | head -n 1 > > runs in about 2s on my machine. The commit it finds, ed194d136769, > is pretty close to the middle: > > $ git rev-list --count v3.0..ed194d136769 > 526863 > $ git rev-list --count ed194d136769..v6.13-rc7 > 543312 Interesting thought. When I did the "single strand of pearls" optimization, I recall I punted and said "we need to count the weight for all merges the honest way". One thing we may want to try is *not* to do the count_distance() for all merges. For example, if we have 1000 commits in the range, first you pick a merge M among them and count how many commits in the range it can reach. Let's say it reachs 400 commits. We are trying to find a commit that can reach as close to 500 commits, and we know any ancestor of M would reach fewer than 400 commits, so we know the score they will get would be worse than M without running count_distance() on them. We should be able to exploit this to optimize, shouldn't we? In order to count the number for M, count_distance() must have traversed all the ancestor commits of M before coming up with its answer, so by the time we see M's score (1000/2 - 400) and realize that it is the best one we have seen so far that we aim to improve, we know that the score for all the commits we have seen during that traversal cannot be better than M's, no? If M can reach 700 commits (instead of 400), the argument goes the other way---anything that can reach M can reach even more, so they cannot be any closer to the middle than M. Knowing what can reach M, however, needs something like reachability bitmap, though.