Re: first bisection step takes quite a while

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 21 Feb 2025 09:47:54 -0800

Christian Couder <christian.couder@xxxxxxxxx> writes:

> Yeah, it seems to me that in practice this is a bit like bisecting on
> the first parents first. It would be nice if we had added an option to
> bisect on the first parents first, so that we could compare your
> improvement and that option.

Unless you are talking about something entirely different, I am
afraid you are confused.  We added first-parent bisection in mid
2020.

And the first-parent bisection does make things easy, by making it
totally unnecessary to call the "truly stupid" count_distance() at
all.  We can pretend as if we have a single-strand-of-pearls, give
the "good" end of the history "1" as its weight, its direct
descendant (and there is only one direct descendant when we are
doing first-parent bisection, since there always is only one active
"bad" end of the range in our bisection session) "2" as its weight,
and so on.  The commit that gets N/2 weight is the midway and we
need O(N) computation.

Unfortunatly Uwe's original problem description was not about
first-parent bisection being slow.

>>  * The "this is good enough" logic currently allows us to be within
>>    0.1% of the real halfway point.  Until the candidate set becomes
>>    small enough, we could loosen the criteria to allow larger, say
>>    3%, slack.  This code is written but not enabled (with "0 &&").
>
> If we want to do this, I think we could loosen the criteria even if
> the candidate set is small. Weights are integers so when the number of
> candidates is around 33 or less, a 3% criteria will mean an exact
> match. Then the last 5 steps or so (as 2^5 = 32) would still be
> performed in the same way (with an exact match).

The above follows the same reasoning why we chose "division by 1024"
in the first place.  The illustration patch postulates that we could
be way more aggressive than 0.1% while the set is large by dividing
64, without wanting to loosen the criteria near the end of the
bisection session when the remaining set is reasonably small like
1000 commits.  So we cannot rely on integer division truncating.

>>  * After computing the weight for a merge in "honest and stupid"
>>    way, we know what other commits in the set it can reach.  If the
>>    weight we computed is way smaller than the half the number of
>>    commits in the set, that means these commits we can reach from
>>    the merge we are looking at would score even lower.  We could
>>    mark them as not-viable before clearing the list to check next
>>    merge with "honest and stupid" way.  Again, this code is written
>>    but not enabled.
>> ...

> About #2, I think it could be worth implementing as an option if it is
> effective in some cases, but the criteria should be loosened even if
> the candidate set is small. The amount of code to implement it is very
> small and it's possible that, for some users, having to sometimes
> perform one more step of testing is not a big deal, compared to
> bisecting speed.

The reason why I think #2 would not be effective is quite different,
but I'd not go into that, as your conclusion is the same ;-)

> About #3, I think that implementing an option to bisect on the first
> parents first is likely more useful than implementing it.

Sorry, but is very much orthogonal to the issue at hand.  We already
have the first-parent bisection.

The last optimization is all about how to reduce needless calls to
count_distance() by rejecting merge commits that can never be an
ancestor that a single-strand-of-pearls history from a non-merge
commit with the best "score" could reach.  And it is relevant only
if we want to improve performance of non-first-parent bisection.

Thanks.