Re: RFE: "git bisect reverse"

Ealdwulf Wuffinga <ealdwulf@xxxxxxxxxxxxxx> · Thu, 28 May 2009 22:07:29 +0100

On Thu, May 28, 2009 at 4:11 AM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:

> Again, given a bisection, the information gain by "bisecting" at point x
>  where 0 < x < 1 is:
>
>        -(x log2 x)-((1-x) log2 (1-x))
>
> At x = 0.5 this gives the optimal 1 bit, but the curve is rather flat
> near the top.  You don't drop to 1/2 bit of information until
> x = 0.11 or 0.89, and it doesn't drop to 1/4 bit of information until
> x = 0.04 or 0.96.
>
> Thus, the lack of optimality in searching away from a skip point is much
> smaller than the potential cost of having to having to skip multiple
> nearby points.

I understand that. I didn't mean to imply that there was anything
wrong with your proposal, indeed, it makes sense for git-bisect.

What I am interested in is how to extend bisection to the case of
intermittent bugs; where a test which observes the fault means that it
cannot have been introduced in subsequent commits, but a test which
does not observe the fault cannot guarantee that it must have been
introduced in a subsequent commit.

The simplest way to deal with this is to try to reduce it to the
deterministic case
by repeating the test some number of times. It turns out, that this is
rather inefficient.

In bbchop, the search algorithm does not assume that the test is deterministic.
Therefore, it has to calculate the probabilities in order to know when it has
accumulated enough evidence to accuse a particular commit. It turns
out that it is not much more expensive to calculate which commit we
can expect to  gain the most information from by testing it next.

How can I incorporate your skipping feature into this model? The problem is that
while (just thinking about the linear case for the moment) there is a
fixed boundary at one end - where we actually saw a fault - on the
other side there are a bunch of fuzzy probabilities, ultimately
bounded by wherever we decided the limit of the search was.
So when we get a skip we could hop half way toward the limit. That
would be reasonable toward the beginning of the search, but towards
the end when most of the probability is concentrated in a small number
of commits, it would make no sense.

It would fit a lot better into this algorithm to have some model of
the probability that a commit will cause a skip. It doesn't actually
have to be a very good one, because if it's poor it will only make the
search slightly less efficient, not affect the reliability of the
final result.

Ealdwulf
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html