On Fri, Feb 08, 2019 at 07:23:19PM +0100, SZEDER Gábor wrote: > > Picking an <N> is tough. Too low and you get a false negative, too high > > and you can wait forever, especially if the script is long. But I don't > > think there's any real way to auto-scale it, except by seeing a few of > > the failing cases and watching how long they take. > > So far I've chosen <N> like this: run the test script with --stress > 3-5 times to trigger the failure, take the highest repetition count > that was necessary for the failure, multiply it by 4-6 to get a round > number, and that's a good ballpark for <N>. And once bisect came up > with the suspect commit, I double checked it by letting the test > script run with --stress on its parent commit for at least 5-10x <N> > repetitions. Heh. That's exactly my process, too. :) > Anyway, I doubt that auto-scaling <N> is worth the effort. Yeah, especially because as a concept it exists outside of the script itself (i.e., you have to checkout a failing version and then run the script a bunch of times; that's not something that test-lib.sh should even know about). So let's go with this for now. It's already a much nicer tool than we had yesterday, so we can take some time to get used to it. -Peff