On Thu, May 28, 2009 at 4:11 AM, H. Peter Anvin <hpa@xxxxxxxxx> wrote: > Again, given a bisection, the information gain by "bisecting" at point x > where 0 < x < 1 is: > > -(x log2 x)-((1-x) log2 (1-x)) > > At x = 0.5 this gives the optimal 1 bit, but the curve is rather flat > near the top. You don't drop to 1/2 bit of information until > x = 0.11 or 0.89, and it doesn't drop to 1/4 bit of information until > x = 0.04 or 0.96. > > Thus, the lack of optimality in searching away from a skip point is much > smaller than the potential cost of having to having to skip multiple > nearby points. I understand that. I didn't mean to imply that there was anything wrong with your proposal, indeed, it makes sense for git-bisect. What I am interested in is how to extend bisection to the case of intermittent bugs; where a test which observes the fault means that it cannot have been introduced in subsequent commits, but a test which does not observe the fault cannot guarantee that it must have been introduced in a subsequent commit. The simplest way to deal with this is to try to reduce it to the deterministic case by repeating the test some number of times. It turns out, that this is rather inefficient. In bbchop, the search algorithm does not assume that the test is deterministic. Therefore, it has to calculate the probabilities in order to know when it has accumulated enough evidence to accuse a particular commit. It turns out that it is not much more expensive to calculate which commit we can expect to gain the most information from by testing it next. How can I incorporate your skipping feature into this model? The problem is that while (just thinking about the linear case for the moment) there is a fixed boundary at one end - where we actually saw a fault - on the other side there are a bunch of fuzzy probabilities, ultimately bounded by wherever we decided the limit of the search was. So when we get a skip we could hop half way toward the limit. That would be reasonable toward the beginning of the search, but towards the end when most of the probability is concentrated in a small number of commits, it would make no sense. It would fit a lot better into this algorithm to have some model of the probability that a commit will cause a skip. It doesn't actually have to be a very good one, because if it's poor it will only make the search slightly less efficient, not affect the reliability of the final result. Ealdwulf -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html