Re: first bisection step takes quite a while

Christian Couder <christian.couder@xxxxxxxxx> · Fri, 21 Feb 2025 21:24:59 +0100

On Fri, Feb 21, 2025 at 6:47 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
> > Yeah, it seems to me that in practice this is a bit like bisecting on
> > the first parents first. It would be nice if we had added an option to
> > bisect on the first parents first, so that we could compare your
> > improvement and that option.
>
> Unless you are talking about something entirely different, I am
> afraid you are confused.  We added first-parent bisection in mid
> 2020.

Yeah, I know that. But I don't think there is a mode which performs
first-parent bisection first and then continues bisecting normally (so
not only on the first parents). That's why I called it an option that
does "first parents first" and not just "first parent".

> And the first-parent bisection does make things easy, by making it
> totally unnecessary to call the "truly stupid" count_distance() at
> all.  We can pretend as if we have a single-strand-of-pearls, give
> the "good" end of the history "1" as its weight, its direct
> descendant (and there is only one direct descendant when we are
> doing first-parent bisection, since there always is only one active
> "bad" end of the range in our bisection session) "2" as its weight,
> and so on.  The commit that gets N/2 weight is the midway and we
> need O(N) computation.

Yeah, that's why a mode that does first parent bisection and then
continues to bisect normally would likely perform well. Because when
first-parent bisection is done, then hopefully the set of commits to
bisect has been reduced enough that further bisection is fast.

> Unfortunatly Uwe's original problem description was not about
> first-parent bisection being slow.
>
> >>  * The "this is good enough" logic currently allows us to be within
> >>    0.1% of the real halfway point.  Until the candidate set becomes
> >>    small enough, we could loosen the criteria to allow larger, say
> >>    3%, slack.  This code is written but not enabled (with "0 &&").
> >
> > If we want to do this, I think we could loosen the criteria even if
> > the candidate set is small. Weights are integers so when the number of
> > candidates is around 33 or less, a 3% criteria will mean an exact
> > match. Then the last 5 steps or so (as 2^5 = 32) would still be
> > performed in the same way (with an exact match).
>
> The above follows the same reasoning why we chose "division by 1024"
> in the first place.  The illustration patch postulates that we could
> be way more aggressive than 0.1% while the set is large by dividing
> 64, without wanting to loosen the criteria near the end of the
> bisection session when the remaining set is reasonably small like
> 1000 commits.  So we cannot rely on integer division truncating.

The code you posted above uses 10000 as the threshold, not 1000:

10000 < nr && abs(diff) < nr / 64) || abs(diff) < nr / 1024)

10000 is between 2^14 and 2^13. This means that the last 13 to 14
bisection steps likely don't benefit from the "way more aggressive"
criteria of 3% vs 0.1%. I know that the last steps are the fastest as
there are fewer commits to take into account, but still it seems to me
that we could make all the steps (except the last 5 or so because then
the criteria change likely doesn't change anything) benefit.

You say that we cannot rely on integer division truncation, but when
comparing integers, we could be careful enough so that there is no
difference between comparing them to a truncated number versus
comparing them to the same number not truncated. So I think we could
be able to rely on integer division.

So maybe just something like:

criteria = user_priority_is_bisect_speed ? 64 : 1024;
if (abs(diff) <= nr / criteria)
     return 1;

Thanks for working on this.