On Mon, Jul 04, 2022 at 10:58:22AM +0300, Amir Goldstein wrote: > I had the humble notion that we should make running fstests to > passing-by developers as easy as possible, because I have had the > chance to get feedback from some developers on their first time > attempt to run fstests and it wasn't pleasant, but nevermind. > -g auto -x soak is fine. I really don't think using some kind of group exclusion is the right way to go. First of all, the definition "determinism" keeps shifting around. The most recent way you've used it, it's for "passing-by developers" to be able to tell if their patches may have caused a regression. Is that right. Secondly, a group-based exclusion list is problematic because groups are fixed with respect to a kernel version (and therefore will very easily go out of date), and they are fixed with respect to a file system type, and which tests will either fail or be flaky will vary, widly, by file system type. For example to the "fixed in time" problem, I had a global exclude for the following tests: generic/471 generic/484 generic/554. That's because they were failing for me all the time, with various resources that made them test or global kernel bugs. But unless you regularly check to see whether a test which is on a particular exclusion list or in some magic "soak" group (and "soak is a ***massive*** misnomer --- the group as you've proposed it should really be named "might-fail-perhaps-on-some-fs-type-or-config"), the tests could remain on the list long after the test bug or the kernel bug has been addressed. So as of commit 7fd7c21547a1 ("test-appliance: add kernel version conditionals using cpp to exclude files") in xfstests-bld, generic/471 and generic/484 are only excluded when testing LTS kernels older than 5.10, and generic/554 is only excluded when testing LTS kernels older than 5.4. The point is (a) you can't easily do version-specific exclusions with xfstests group declarations, and (b) someone needs to periodically sweep through the tests to see if tests should be in the soak or "might-fail-perhaps-on-some-fs-type-or-config" group. As an example of why you're going to want to do the exclusions on a per-file system basis, consider the tests that would have to be added to the "might-fail-perhaps-on-some-fs-type-or-config". If the goal is to let "drive-by developer to know whether their has caused a regression", especially someone like Willy who might be modifying a large number of testers, then you would need to add at *least* 135 tests to the "soak" or "might-fail-perhaps-on-some-fs-type-or-config" group: (These are current test failures that I have observed using v4.19-rc4.) btrfs/default: 1176 tests, 7 failures, 244 skipped, 8937 seconds Failures: btrfs/012 btrfs/219 btrfs/235 generic/041 generic/297 generic/298 shared/298 exfat/default: 1222 tests, 23 failures, 552 skipped, 1561 seconds Failures: generic/013 generic/309 generic/310 generic/394 generic/409 generic/410 generic/411 generic/428 generic/430 generic/431 generic/432 generic/433 generic/438 generic/443 generic/465 generic/490 generic/519 generic/563 generic/565 generic/591 generic/633 generic/639 generic/676 ext2/default: 1188 tests, 4 failures, 472 skipped, 2803 seconds Failures: generic/347 generic/607 generic/614 generic/631 f2fs/default: 877 tests, 5 failures, 217 skipped, 4249 seconds Failures: generic/050 generic/064 generic/252 generic/506 generic/563 jfs/default: 1074 tests, 62 failures, 404 skipped, 3695 seconds Failures: generic/015 generic/034 generic/039 generic/040 generic/041 generic/056 generic/057 generic/065 generic/066 generic/073 generic/079 generic/083 generic/090 generic/101 generic/102 generic/104 generic/106 generic/107 generic/204 generic/226 generic/258 generic/260 generic/269 generic/288 generic/321 generic/322 generic/325 generic/335 generic/336 generic/341 generic/342 generic/343 generic/348 generic/376 generic/405 generic/416 generic/424 generic/427 generic/467 generic/475 generic/479 generic/480 generic/481 generic/489 generic/498 generic/502 generic/510 generic/520 generic/526 generic/527 generic/534 generic/535 generic/537 generic/547 generic/552 generic/557 generic/563 generic/607 generic/614 generic/629 generic/640 generic/690 nfs/loopback: 818 tests, 2 failures, 345 skipped, 9365 seconds Failures: generic/426 generic/551 reiserfs/default: 1076 tests, 25 failures, 413 skipped, 3368 seconds Failures: generic/102 generic/232 generic/235 generic/258 generic/321 generic/355 generic/381 generic/382 generic/383 generic/385 generic/386 generic/394 generic/418 generic/520 generic/533 generic/535 generic/563 generic/566 generic/594 generic/603 generic/614 generic/620 generic/634 generic/643 generic/691 vfat/default: 1197 tests, 42 failures, 528 skipped, 4616 seconds Failures: generic/003 generic/130 generic/192 generic/213 generic/221 generic/258 generic/309 generic/310 generic/313 generic/394 generic/409 generic/410 generic/411 generic/428 generic/430 generic/431 generic/432 generic/433 generic/438 generic/443 generic/465 generic/467 generic/477 generic/495 generic/519 generic/563 generic/565 generic/568 generic/569 generic/589 generic/632 generic/633 generic/637 generic/638 generic/639 generic/644 generic/645 generic/656 generic/676 generic/683 generic/688 generic/689 This is not *all* of the tests. There are a number of file system types that are causing the VM to crash. I haven't had time this weekend to figure out what tests need to be added to the exclude group for udf, ubifs, overlayfs, etc. So there might be even *more* tests that we would need to be added to the "might-fail-perhaps-on-some-fs-type-or-config" group. It *could* be fewer, if we want to exclude reiserfs and jfs, on the theory that they might be deprecated soon. But there are still some very commonly used file systems, such as vfat, exfat, etc., that have a *huge* number of failing tests that is going to make life unpleasant for the drive-by developer/tester. And there are other file systems which will cause a kernel crash or lockup on v4.19-rc4, which certainly will give trouble for the drive-by tester. Which is why I argue that using a group, whether it's called soak, or something else, to exclude all of the tests that might fail and thus confuse the passing-by fs testers is the best way to go. > The reason I suggested that *we* change our habits is because > we want to give passing-by fs testers an easier experience. Realistically, if we want to give passing-by fs testers an easier experience, we need to give these testers more turn-key experience. This was part of my original goals when I created kvm-xfstests and gce-xfstests. This is why I upload pre-created VM images for kvm-xfstests and gce-xfstests --- so people don't have to build xfstests and all their dependencies, and to figure out how to set up and configure it. For example, it means that I can tell ext4 developers to just run "kvm-xfstests smoke" as a bare minimum before sending me a patch for review. There is more that we clearly need to do if we want to make something which is completely turn-key for a drive-by tester, especially if they need to test more than just ext4 and xfs. I have some ideas, and this is something that I'm hoping to do more work in the next few months. If someone is interested in contributing some time and energy to this project, please give me a ring. Many hands make light work, and all that. Cheers, - Ted