Re: [PATCH v3 0/3] automatically skip away from broken commits

Jakub Narebski <jnareb@xxxxxxxxx> · Tue, 9 Jun 2009 23:55:49 +0200

On Tue, 9 June 2009, H. Peter Anvin wrote:
> Jakub Narebski wrote:
> > 
> > By the way, I have asked question about best algorithm for "bisect skip"
> > on StackOverflow[1], but didn't get (yet) any good responses...
> > 
> > [1]: http://stackoverflow.com/questions/959324/
> > 
> 
> I don't think there is a "best" algorithm, but I concur with the poster
> that said broken commits tend to cluster.

Well, I guess that there might be, at least if we had some reasonable
assumption on probability distribution of bad commits.

Note: the idea sketched below is just handwaving currently...

Let us assume that we are currently at some untestable commit. Let us
also assume that we have some halfway reasonable model of probability
that a given commit is untestable, given it distance from known
untestable commit. "git rev-list --bisect-all" (or its inner equivalent)
would give us list of commits in the searched range, sorted in 
descending order by distance from edges (endpoints) of range:

  commit    "goodness" 
  --------------------
  c21d2e5*  (dist=60)
  94d6d14   (dist=59)
  ccb06f4   (dist=59)
  d1a1610   (dist=58)
  d4bf4b4   (dist=58)
  16c5646   (dist=57)

Let us assume that "c21d2e5" is untestable, and that we can easily 
calculate distance from it, substituting 0/0 (undef) if a commit
is not in straight line from "c21d2e5".

  commit    "goodness"  d
  -------------------------
  c21d2e5*  (dist=60)   0
  94d6d14   (dist=59)   1
  ccb06f4   (dist=59)   --
  d1a1610   (dist=58)   --
  d4bf4b4   (dist=58)   2
  16c5646   (dist=57)   3

Let us also assume that we have some model of probability that a commit
is untestable. In the example below numbers are ad hoc, and unrealistic.

  commit    "goodness"  d   P(untestable)
  ----------------------------------------
  c21d2e5*  (dist=60)   0   100%
  94d6d14   (dist=59)   1    75%
  ccb06f4   (dist=59)   --    0%
  d1a1610   (dist=58)   --    0%
  d4bf4b4   (dist=58)   2    33%
  16c5646   (dist=57)   3    25%

We can now calculate average number of bits of information would bring
(IIRC it was HPA and Christian that was writing about 'average information
gain' and 'bits of information at each step'; I don't quote know how it
is to be calculated)

  commit    "goodness"  d   P(untestable)  avg. gain
  ----------------------------------------
  c21d2e5*  (dist=60)   0   100%           0.0001
  94d6d14   (dist=59)   1    75%           0.45
  ccb06f4   (dist=59)   --    0%           0.98
  d1a1610   (dist=58)   --    0%           0.95
  d4bf4b4   (dist=58)   2    33%           0.65
  16c5646   (dist=57)   3    25%           0.66

Here 'avg. gain' numbers are totally handwaving... but the idea is to
pick up as next test point the commit with mist average information
gain.

What do you think of this algorithm (after of course it is made into
proper algorithm :-))?
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html