On Thu, May 19, 2022 at 05:06:07PM +0100, Matthew Wilcox wrote: > On Thu, May 19, 2022 at 11:44:19PM +0800, Zorro Lang wrote: > > On Thu, May 19, 2022 at 03:58:31PM +0100, Matthew Wilcox wrote: > > > On Thu, May 19, 2022 at 07:24:50PM +0800, Zorro Lang wrote: > > > > Yes, we talked about this, but if I don't rememeber wrong, I recommended each > > > > downstream testers maintain their own "testing data/config", likes exclude > > > > list, failed ratio, known failures etc. I think they're not suitable to be > > > > fixed in the mainline fstests. > > > > > > This assumes a certain level of expertise, which is a barrier to entry. > > > > > > For someone who wants to check "Did my patch to filesystem Y that I have > > > never touched before break anything?", having non-deterministic tests > > > run by default is bad. > > > > > > As an example, run xfstests against jfs. Hundreds of failures, including > > > some very scary-looking assertion failures from the page allocator. > > > They're (mostly) harmless in fact, just being a memory leak, but it > > > makes xfstests useless for this scenario. > > > > > > Even for well-maintained filesystems like xfs which is regularly tested, > > > I expect generic/270 and a few others to fail. They just do, and they're > > > not an indication that *I* broke anything. > > > > > > By all means, we want to keep tests around which have failures, but > > > they need to be restricted to people who have a level of expertise and > > > interest in fixing long-standing problems, not people who are looking > > > for regressions. > > > > It's hard to make sure if a failure is a regression, if someone only run > > the test once. The testers need some experience, at least need some > > history test data. > > > > If a tester find a case has 10% chance fail on his system, to make sure > > it's a regression or not, if he doesn't have history test data, at least > > he need to do the same test more times on old kernel version with his > > system. If it never fail on old kernel version, but can fail on new kernel. > > Then we suspect it's a regression. > > > > Even if the tester isn't an expert of the fs he's testing, he can report > > this issue to that fs experts, to get more checking. For downstream kernel, > > he has to report to the maintainers of downstream, or check by himself. > > If a case pass on upstream, but fail on downstream, it might mean there's > > a patchset on upstream can be backported. > > > > So, anyway, the testers need their own "experience" (include testing history > > data, known issue, etc) to judge if a failure is a suspected regression, or > > a known issue of downstream which hasn't been fixed (by backport). > > > > That's my personal perspective :) > > Right, but that's the personal perspective of an expert tester. I don't > particularly want to build that expertise myself; I want to write patches > which touch dozens of filesystems, and I want to be able to smoke-test > those patches. Maybe xfstests or kdevops doesn't want to solve that I think it's hard to judge which cases are smoke-test cases commonly, especially you hope they should all pass if no real bugs. If for "all filesystems", I have to recomment some simple cases of fsx and fsstress only... Even if we can add a group name as 'smoke', and mark all stable and simple enough test cases as 'smoke', but I still can't be sure './check -g smoke' will test pass for your all filesystems testing with random system environment :) Thanks, Zorro > problem, but that would seem like a waste of other peoples time. >