Re: [syzbot] [bcachefs?] general protection fault in proc_sys_compare

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Fri, 7 Mar 2025 09:33:11 -0500

On Fri, Mar 07, 2025 at 08:31:26AM -0500, Theodore Ts'o wrote:
> On Fri, Mar 07, 2025 at 06:51:23AM -0500, Kent Overstreet wrote:
> > 
> > Better bisection algorithm? Standand bisect does really badly when fed
> > noisy data, but it wouldn't be hard to fix that: after N successive
> > passes or fails, which is unlikely because bisect tests are coinflips,
> > backtrack and gather more data in the part of the commit history where
> > you don't have much.
> 
> My general approach when handling some test failure is to try running
> the reproducer 5-10 times on the original commit where the failure was
> detected, to see if the reproducer is reliable.  Once it's been
> established whether the failure reproduces 100% of the time, or some
> fraction of the time, say 25% of the time, then we can estalbish how
> times we should try running the reproducer before we can conclude the
> that a particular commit is "good" --- and the first time we detect a
> failure, we can declare the commit is "bad", even if it happens on the
> 2nd out of the 25 tries that we might need to run a test if it is
> particularly flaky.

That does sound like a nice trick. I think we'd probably want both
approaches though, I've seen cases where a test starts out failing
perhasp 5% of the time and then jumps up to 40% later on - some other
behavioural change makes your race or what have you easier to hit.

Really what we're trying to do is determine the shape of an unknown
function sampling; we hope it's just a single stepwise change
but if not we need to keep gathering more data until we get a clear
enough picture (and we need a way to present that data, too).

> 
> Maybe this is something Syzbot could implement?

Wouldn't it be better to have it in 'git bisect'?

> And if someone is familiar with the Go language, patches to implement
> this in gce-xfstests's ltm server would be great!  It's something I've
> wanted to do, but I haven't gotten around to implementing it yet so it
> can be fully automated.  Right now, ltm's git branch watcher reruns
> any failing test 5 times, so I get an idea of whether a failure is
> flaky or not.  I'll then manually run a potentially flaky test 30
> times, and based on how reliable or flaky the test failure happens to
> be, I then tell gce-xfstests to do a bisect running each test N times,
> without having it stop once the test fails.  It wasts a bit of test
> resources, but since it doesn't block my personal time (results land
> in my inbox when the bisect completes), it hasn't risen to the top of
> my todo list.

If only we had interns and grad students for this sort of thing :)