Hi folks, I pulled on a string a couple of days ago, and it got out of control. It all started when I went to kill a test with ctrl-c and it, once again, left background processes running that I had to hunt down and kill manually. I then started looking a why this keeps happening, and realised that the way we clean up on test completion is messy, inconsistent and frequently buggy. So I started cleaning it all up, starting with the tests/xfs directory because I saw a lot of low hanging fruit there. Essentially, we use _cleanup() functions as a way of overriding the default trap handler we install in _begin_fstest(). Rather than register a new handler, we just redefine the common cleanup function and re-implement it (poorly) in every test that does an override. Often these overrides are completely unnecessary - I think I reduced the total number of overrides in tests/xfs by ~30% (~190 -> ~125), and I reudced the number of *unique overrides by a lot more than that. The method for overriding changes to be "stacked cleanups" rather than "duplicated cleanups". That is, tests no longer open code: cd / rm -rf $tmp.* THis is what common/preamble::_cleanup() does. We should call that function to do this. Hence if we have a local cleanup that we need to do, it becomes: local_cleanup() { rm -f $testfile _cleanup } _register_cleanup local_cleanup While this looks more verbose, it means we can actually reuse the same cleanup function across lots of tests. A large number of xfsdump tests were all using the same override cleanup function to call _cleanup_dump. These are all changed to: . ./common/dump _register_cleanup _cleanup_dump and _cleanup_dump stacks like this: _cleanup_dump() { #do xfsdump cleanup stuff _cleanup } and we don't need to do anything else. There is one xfsdump test that needs it's own cleanup. It stacks like this: local_cleanup() { rm -f $testfile _cleanup_dump } _register_cleanup local_cleanup All the tests that run fsstress in the background now have a common cleanup function that kills fsstress processes defined in common/preamble. They just do: _register_cleanup _cleanup_fsstress And now every test that puts fsstress in the background behaves correctly and kills all the background fsstree processes when interrupted. The conversion is by no means complete. I've named the local cleanup functions by what they do so we can go back and derive commonality between them. The number of different variations on tearing down loops devices is crazy, and half of them are buggy. I haven't worked through these yet, so you'll see lots of tests with: _loop_cleanup() { ...... _cleanup } _register_cleanup _loop_cleanup That have similar but different ways of cleaning up loop devices. I also added a _no_cleanup() function, as there are a large number of xfs fuzzer tests that want to leave a warm corpse behind so that debugging what just happened is easy. I also added BUS to the default signal trap set - well over a 100 tests in tests/xfs had a line like: _register_cleanup "_cleanup" BUS just to add BUS signals to the set that would cause the cleanup function to run. Just make it the default! Overall, this significantly reduces the amount of boiler plate in tests, and sets us down the path of common cleanup functions that tests may not even need to define. e.g. just including ./common/dmflakey registers the _cleanup_dmflakey() trap that will do all the necessary cleanup when the test exists. This makes the tests simpler, more robust and reduces the maintenance burden of over 1700 individual tests.... I won't put the full diffstat in this mail, but the benefits should be clean from the summary: 360 files changed, 420 insertions(+), 1781 deletions(-) I've lost count of the number of test bugs I killed in removing all this code, and that's largely just in the tests/xfs directory. So before I go spend another couple of days on converting the rest of fstests, I figured I better make sure everyone is OK with these changes. Thoughts, comments? -Dave.