On Tue, Nov 06, 2018 at 05:50:28PM -0600, Jayashree Mohan wrote: > On Tue, Nov 6, 2018 at 5:40 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > On Tue, Nov 06, 2018 at 06:15:36PM -0500, Theodore Y. Ts'o wrote: > > > On Mon, Nov 05, 2018 at 02:16:57PM -0600, Jayashree Mohan wrote: > > > > > > > > I believe that to _scratch_mkfs, I must first _cleanup dm_flakey. If I > > replace the above snippet by > > > > _cleanup > > > > _scratch_mkfs > > > > _init_flakey > > > > > > > > The time taken for the test goes up by around 10 seconds (due to mkfs > > maybe). So I thought it was sufficient to remove the working directory. > > > > > > Can you try adding _check_scratch_fs after each test case? Yes, it > > > > _check_scratch_fs now runs xfs_scrub on XFS as well as xfs_repair, > > so it's actually quite expensive. > > > > The whole point of aggregating all these tests into one fstest is to > > avoid the overhead of running _check_scratch_fs after every single > > test that are /extremely unlikely/ to fail on existing filesystems. > > > > Filipe and Eryu suggest that we run _check_scratch_fs after each subtest. > Quoting Filipe, These tests are highly unlikely to fail on existing filesystems. If they do fail, then the developer can narrow it down by modifying the test to the single test that fails and add _check_scratch_fs where necessary. Run time matters. Expensive, finegrained testing is only useful if there's a high probability of the test finding ongoing bugs. These tests have found bugs when they were first written, but the ongoing probability of finding more bugs is extremely low. Adding a huge amount of testing overhead for what appear to be very marginal returns is not a good tradeoff. > > Plus this test creates a very small fs, it's not like fsck will take a > > significant time to run. > > So for all these reasons I would unmount and fsck after each test. > > For this reason, we currently _check_scratch_fs after each subtest in the > _check_consistency method in my patch. > + _unmount_flakey > + _check_scratch_fs $FLAKEY_DEV > + [ $? -ne 0 ] && _fatal "fsck failed" > > Running on a 200MB partition, addition of this check added only around 3-4 > seconds of delay in total for this patch consisting of 37 tests. Currently > this patch takes about 12-15 seconds to run to completion on my 200MB > partition. What filesystem, and what about 20GB scratch partitions (which are common)? i.e. Checking cost is different on different filesystems, different capacity devices and even different userspace versions of the same filesystem utilities. It is most definitely not free, and in some cases can be prohibitively expensive. ---- I suspect we've lost sight of the fact that fstests was /primarily/ a filesystem developer test suite, not a distro regression test suite. If the test suite becomes too cumbersome and slow for developers to use effectively, then it will get used less during development and that's a *really, really bad outcome*. e.g. I don't use the auto group in my development workflow any more - it's too slow to get decent coverage of changes these days. I only use the auto group for integration testing now. A few years ago it would take about an hour to run the auto group on a couple of spindles. These days it takes closer to 5 hours on the same setup. Even on my really fast pmem setup it take a couple of hours to run the entire auto group. As I have 15 different configurations I need to run through integration testing and limited resources, runtime certainly matters. It even takes half an hour to run the quick group on my fast machine, which really isn't very quick anymore because of the sheer number of tests in the quick group. Half an hour is too slow for effective change feed back - feedback within 5 minutes is necessary, otherwise the developer will context switch to something else while waiting and lose all focus on what they were doing. This leads to highly inefficient developers. The quick group is now full of short, fine grained, targetted regression tests. These are useful for distro release-to-release testing, but the chances that they find new bugs are extremely low. They really aren't that useful for developers who "fix the bug and move on" and will never see that test fail ever again. If one of these tests has a huge overhead or the sheer number of those tests creates substantial overhead, then that is not a particularly good use of the developer's time and resources. IOWs, there's a cost to run each test that the really important use case for fstests (i.e. the developers' test/dev cycle) cannot really afford to pay that cost. i.e. developers need wide test coverage /at speed/, not need deep, fine-grained, time consuming test coverage. To put it in other words, developers need tests focussed on finding bugs quickly, not regression tests that provide the core requirements of integration and release testing. The development testing phase is all about finding fast and effciently. There's been little done in recent times to make fstests faster and more efficient at finding new bugs for developers - the vast majority of new tests has been for specific bugs that have already been fixed. Even these crashmonkey tests are not "find new bugs" tests. The bugs they uncovered have been found and fixed, and so they fall into this same finegrained integration regression test category. The only tests that I've seen discover new bugs recently are those that run fsx, fstress or some other semi-randomised workloads that are combined with some other operation. These tests find the bugs that fine-grained, targetted regression tests will never uncover, and so in many cases running most of these integration/regression tests doesn't provide any value to the developer. The faster the tests find the bugs, the faster the developer fixes them. Then more dev/test cycles a developer can do in a day, the more bugs they find and fix. So we want fstests to remain useful to developers, then we have to pay more attention to runtime and how efficient the tests are at fiding new bugs. More is not better. Perhaps we need to recategorise the tests into new groups. Perhaps we need to start working on reducing the huge amount of technical debt we now have in fstests. Perhaps we need to scale the fstests infrastructure to support thousands of tests efficiently. Perhaps we need to focus on tests that uncover new bugs (looking forwards) rather than regression tests (looking backwards) But, really, if we don't stop and think about what we are doing here fstests will slowly drop out of the day-to-day developer workflow because it is becoming less useful for finding new filesystem bugs quickly... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx