Re: Stochastic bisection support

Jan Kara <jack@xxxxxxx> · Mon, 22 Nov 2021 14:31:23 +0100

Hi!

On Mon 22-11-21 13:55:33, Christian Couder wrote:
> On Thu, Nov 18, 2021 at 9:33 PM Jan Kara <jack@xxxxxxx> wrote:
> >
> > Hello,
> >
> > In some cases regressions (or generally changes) we are trying to bisect have
> > probabilistic nature. This can for example happen for hard to trigger race
> > condition where it is difficult to distinguish working state from just not
> > hitting the race or it can happen for performance regressions where it is
> > sometimes difficult to distinguish random workload fluctuations from the
> > regression we are looking for. With standard bisection the only option we have
> > is to repeatedly test suggested bisection point until we are sure enough which
> > way to go. This leads to rather long bisection times and still a single wrong
> > decision whether a commit is good to bad renders the whole bisection useless.
> >
> > Stochastic bisection tries to address these problems. When deciding whether a
> > commit is good or bad, you can also specify your confidence in the decision.
> > For performance tests you can usually directly infer this confidence from the
> > distance of your current result from good/bad values, for hard to reproduce
> > races you are usually 100% confident for bad commits, for good commits you need
> > to somehow estimate your confidence based on past experience with reproducing
> > the issue. The stochastic bisection algorithm then uses these test results
> > and confidences to suggest next commit to try, tracking for each commit the
> > probability the commit is the bad one given current test results. Once some
> > commit reaches high enough probability (set when starting bisection) of being
> > the bad one, we stop bisecting and annouce this commit.
> 
> The following project is based on Bayesian Search Theory and might be
> interesting if you haven't looked at it:
> 
> https://github.com/Ealdwulf/BBChop

Thanks for the link. I already know about that project and I had a look
into it when doing some initial research. But the biggest limitation of
that project is that it works only for linear history. I need to generally
bisect Linux kernel repository which has enough merges that the limitation
of linear history makes the use of the above tool impractical.

Furthermore direct integration of stochastic bisection into git makes this
easier to integrate into our performance testing framework.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR