Re: Stochastic bisection support

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Fri, 19 Nov 2021 17:39:35 +0100 (CET)

Hi Jan,

On Thu, 18 Nov 2021, Jan Kara wrote:

> In some cases regressions (or generally changes) we are trying to bisect have
> probabilistic nature. This can for example happen for hard to trigger race
> condition where it is difficult to distinguish working state from just not
> hitting the race or it can happen for performance regressions where it is
> sometimes difficult to distinguish random workload fluctuations from the
> regression we are looking for. With standard bisection the only option we have
> is to repeatedly test suggested bisection point until we are sure enough which
> way to go. This leads to rather long bisection times and still a single wrong
> decision whether a commit is good to bad renders the whole bisection useless.
>
> Stochastic bisection tries to address these problems. When deciding whether a
> commit is good or bad, you can also specify your confidence in the decision.
> For performance tests you can usually directly infer this confidence from the
> distance of your current result from good/bad values, for hard to reproduce
> races you are usually 100% confident for bad commits, for good commits you need
> to somehow estimate your confidence based on past experience with reproducing
> the issue. The stochastic bisection algorithm then uses these test results
> and confidences to suggest next commit to try, tracking for each commit the
> probability the commit is the bad one given current test results. Once some
> commit reaches high enough probability (set when starting bisection) of being
> the bad one, we stop bisecting and annouce this commit.

An interesting problem, for sure!

It is slightly related to a scenario that has been described to me
recently: in a gigantic project whose full test suite is too large to run
with every Pull Request, where tests are more likely to become flaky
rather than simply break, a stochastic CI regime was introduced where a
semi-random subset of the test suite is run with every CI build. That team
also came up with the concept of attaching confidences as you describe.

I only had time to look at the first patch closely so far. I hope to find
more time next week to review further.

Ciao,
Dscho