Re: [PATCH] bisect: loosen halfway() check for a large number of commits

Christian Couder <christian.couder@xxxxxxxxx> · Sat, 24 Oct 2020 09:41:27 +0200

On Thu, Oct 22, 2020 at 8:20 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> SZEDER Gábor <szeder.dev@xxxxxxxxx> writes:
>
> > However, when we have thousands of commits it's not all that important
> > to find the _exact_ halfway point, a few commits more or less doesn't
> > make any real difference for the bisection.
>
> Cute idea.

I like the idea too.

> > So I ran some tests to see how often that happens: picked random good
> > and bad starting revisions at least 50k commits apart and a random
> > first bad commit in between in git.git, and used 'git bisect run git
> > merge-base --is-ancestor HEAD $first_bad_commit' to check the number
> > of necessary bisection steps.  After repeating all this 1000 times
> > both with and without this patch I found that:
> >
> >   - 146 cases needed one more bisection step than before, 149 cases
> >     needed one less step, while in the remaining 705 cases the number
> >     of steps didn't change.  So the number of bisection steps does
> >     indeed change in a non-negligible number of cases, but it seems
> >     that the average number of steps doesn't change in the long run.
>
> It somehow is a bit surprising that there are cases that need fewer
> steps, but I guess that is how rounding-error cuts both ways?

When there are 50k commits span between the initial good and bad, I
don't expect to see any statistically significant result by trying it
1k times only. My guess is that you might start seeing something
significant only when the number of tries is a multiple of the span
between the initial good and bad.

There is some cost on average even if it's small (and gets smaller
when the span increases) of not using the best halfway commit, so the
overall gain depends on how long it takes (and possibly how much it
costs) to run the test script (or maybe to manually test).
Unfortunately without any hint from the user or without recording how
long the test script lasts (which doesn't cover manual testing) we
cannot know this cost of testing which could change a lot between use
cases.

> Mildly (only because such a bisection session over a long span is
> rarer) excited to see this RFC completed ;-)

In projects like the Linux kernel where there are around 10k commits
between 2 feature releases, such bisections over a long span might
actually happen quite often.