On Tue, 9 June 2009, H. Peter Anvin wrote: > Jakub Narebski wrote: > > > > By the way, I have asked question about best algorithm for "bisect skip" > > on StackOverflow[1], but didn't get (yet) any good responses... > > > > [1]: http://stackoverflow.com/questions/959324/ > > > > I don't think there is a "best" algorithm, but I concur with the poster > that said broken commits tend to cluster. Well, I guess that there might be, at least if we had some reasonable assumption on probability distribution of bad commits. Note: the idea sketched below is just handwaving currently... Let us assume that we are currently at some untestable commit. Let us also assume that we have some halfway reasonable model of probability that a given commit is untestable, given it distance from known untestable commit. "git rev-list --bisect-all" (or its inner equivalent) would give us list of commits in the searched range, sorted in descending order by distance from edges (endpoints) of range: commit "goodness" -------------------- c21d2e5* (dist=60) 94d6d14 (dist=59) ccb06f4 (dist=59) d1a1610 (dist=58) d4bf4b4 (dist=58) 16c5646 (dist=57) Let us assume that "c21d2e5" is untestable, and that we can easily calculate distance from it, substituting 0/0 (undef) if a commit is not in straight line from "c21d2e5". commit "goodness" d ------------------------- c21d2e5* (dist=60) 0 94d6d14 (dist=59) 1 ccb06f4 (dist=59) -- d1a1610 (dist=58) -- d4bf4b4 (dist=58) 2 16c5646 (dist=57) 3 Let us also assume that we have some model of probability that a commit is untestable. In the example below numbers are ad hoc, and unrealistic. commit "goodness" d P(untestable) ---------------------------------------- c21d2e5* (dist=60) 0 100% 94d6d14 (dist=59) 1 75% ccb06f4 (dist=59) -- 0% d1a1610 (dist=58) -- 0% d4bf4b4 (dist=58) 2 33% 16c5646 (dist=57) 3 25% We can now calculate average number of bits of information would bring (IIRC it was HPA and Christian that was writing about 'average information gain' and 'bits of information at each step'; I don't quote know how it is to be calculated) commit "goodness" d P(untestable) avg. gain ---------------------------------------- c21d2e5* (dist=60) 0 100% 0.0001 94d6d14 (dist=59) 1 75% 0.45 ccb06f4 (dist=59) -- 0% 0.98 d1a1610 (dist=58) -- 0% 0.95 d4bf4b4 (dist=58) 2 33% 0.65 16c5646 (dist=57) 3 25% 0.66 Here 'avg. gain' numbers are totally handwaving... but the idea is to pick up as next test point the commit with mist average information gain. What do you think of this algorithm (after of course it is made into proper algorithm :-))? -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html