Hi Jan, On Thu, 18 Nov 2021, Jan Kara wrote: > In some cases regressions (or generally changes) we are trying to bisect have > probabilistic nature. This can for example happen for hard to trigger race > condition where it is difficult to distinguish working state from just not > hitting the race or it can happen for performance regressions where it is > sometimes difficult to distinguish random workload fluctuations from the > regression we are looking for. With standard bisection the only option we have > is to repeatedly test suggested bisection point until we are sure enough which > way to go. This leads to rather long bisection times and still a single wrong > decision whether a commit is good to bad renders the whole bisection useless. > > Stochastic bisection tries to address these problems. When deciding whether a > commit is good or bad, you can also specify your confidence in the decision. > For performance tests you can usually directly infer this confidence from the > distance of your current result from good/bad values, for hard to reproduce > races you are usually 100% confident for bad commits, for good commits you need > to somehow estimate your confidence based on past experience with reproducing > the issue. The stochastic bisection algorithm then uses these test results > and confidences to suggest next commit to try, tracking for each commit the > probability the commit is the bad one given current test results. Once some > commit reaches high enough probability (set when starting bisection) of being > the bad one, we stop bisecting and annouce this commit. An interesting problem, for sure! It is slightly related to a scenario that has been described to me recently: in a gigantic project whose full test suite is too large to run with every Pull Request, where tests are more likely to become flaky rather than simply break, a stochastic CI regime was introduced where a semi-random subset of the test suite is run with every CI build. That team also came up with the concept of attaching confidences as you describe. I only had time to look at the first patch closely so far. I hope to find more time next week to review further. Ciao, Dscho