Re: [PATCH] rev-list: estimate number of bisection step left

Christian Couder <chriscool@xxxxxxxxxxxxx> · Thu, 19 Feb 2009 06:16:18 +0100

Le mardi 17 février 2009, Junio C Hamano a écrit :
> Christian Couder <chriscool@xxxxxxxxxxxxx> writes:
> > +static int estimate_bisect_steps(int all)
> > +{
> > +	int log2 = 0;
> > +	int left = (all >> 1) - 1;
> > +
> > +	if (left <= 0)
> > +		return 0;
> > +
> > +	do {
> > +		left = left >> 1;
> > +		log2++;
> > +	} while (left);
> > +
> > +	return log2;
> > +}
> > ...
> > diff --git a/git-bisect.sh b/git-bisect.sh
> > index 85db4ba..6b23439 100755
> > --- a/git-bisect.sh
> > +++ b/git-bisect.sh
> > @@ -500,7 +500,7 @@ bisect_next() {
> >  	# commit is also a "skip" commit (see above).
> >  	exit_if_skipped_commits "$bisect_rev"
> >
> > -	bisect_checkout "$bisect_rev" "$bisect_nr revisions left to test
> > after this" +	bisect_checkout "$bisect_rev" "$bisect_nr revisions left
> > to test after this (roughtly $bisect_steps steps)"
>
> "roughly".

Yes, thanks.

> all	left
> 0	0
> 1	0
> 2	0
> 3	0
> 4	1
> 5	1
> 6	2
> 7	2
> 8	2
> 9	2
>
> It seems that at the very low end the estimate is a bit too optimistic.
> How about showing this number from the Porcelain only when $bisect_steps
> is more than 2 (or all is more than 9)?

I think it's more consistent to always show it.

Now for the algorithm, first please note that we are looking for an estimate 
of the number of bisect steps left _after the current one_, and that git 
bisect currently only displays an estimate of the number of revisions left 
to test _after the current one_.

Here is a table to help analyse what should be the best estimate for
the number of bisect steps left.

N : linear case                    --> probabilities --> best | v1
------------------------------------------------------------------
1 : G-B                            --> 0             --> 0    | 0
2 : G-U1-B                         --> 0             --> 0    | 0
3 : G-U1-U2-B                      --> 0(1/3) 1(2/3) --> 1    | 0
4 : G-U1-U2-U3-B                   --> 1             --> 1    | 1
5 : G-U1-U2-U3-U4-B                --> 1(3/5) 2(2/5) --> 1    | 1
6 : G-U1-U2-U3-U4-U5-B             --> 1(2/6) 2(4/6) --> 2    | 2
7 : G-U1-U2-U3-U4-U5-U6-B          --> 1(1/7) 2(6/7) --> 2    | 2
8 : G-U1-U2-U3-U4-U5-U6-U7-B       --> 2             --> 2    | 2
9 : G-U1-U2-U3-U4-U5-U6-U7-U8-B    --> 2(7/9) 3(2/9) --> 2    | 2
10: G-U1-U2-U3-U4-U5-U6-U7-U8-U9-B --> 2(6/10)3(4/10)--> 2    | 3

In the column "N", there is the number of revisions that could _now_
be the first bad commit we are looking for.

The "linear case" column describes the linear history corresponding to
the number in column N. G means good, B means bad, and Ux means
unknown. Note that the first bad revision we are looking for can be
any Ux or B.

In the "probabilities" column, there are the different outcomes in
number of steps with the odds of each outcome in parenthesis
corresponding to the linear case.

The "best" column gives the most accurate estimate among the different
outcomes in the "probabilities" column.

The "v1" column gives the estimates according my first patch.

Now looking at the table, we have the following:

best(2^n) == n - 1

and for any x between 0 included and 2^n excluded, the probability for
n - 1 steps left looks like:

P(2^n + x) == (2^n - x) / (2^n + x) 

If P(2^n + x) < 0.5 we should return n and otherwise n - 1.

But P(2^n + x) < 0.5 means:

2 * (2^n - x) < (2^n + x)

that is: 2^n < 3x

So the improved algorithm could be something like:

static int estimate_bisect_steps(int all)
{
	int n, x, e;
	float p;

	if (all < 3)
		return 0;

	n = log2(all);
	e = exp2(n);
	x = all - e;

	return (e < 3 * x) ? n : n - 1 ;
}

But on Linux, log2 and exp2 are defined in "math.h" and available with:

_XOPEN_SOURCE >= 600 || _ISOC99_SOURCE; or cc -std=c99

and we must link with -lm, but I don't know about the other platforms.

So I don't know what to do about them. Please advise.

Thanks in advance,
Christian.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html