I think I fixed all reported issues with the following patch. It always execute the fixture setup in the child process and execute the teardown in the child process by default (e.g. for seccomp tests which have assumptions about that). Only the Landlock teardown tests are executed in the parent process thanks to the new _metadata->teardown_parent boolean. Child signals are always forwarded to the parent process where __wait_for_test() check that. This works with seccomp and Landlock tests, and I think with all the others. I'll send a v2 of the vfork patch. diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index ad49832457af..4f192904dfd6 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -382,29 +382,33 @@ /* fixture data is alloced, setup, and torn down per call. */ \ FIXTURE_DATA(fixture_name) self; \ pid_t child = 1; \ + int status = 0; \ memset(&self, 0, sizeof(FIXTURE_DATA(fixture_name))); \ if (setjmp(_metadata->env) == 0) { \ - fixture_name##_setup(_metadata, &self, variant->data); \ - /* Let setup failure terminate early. */ \ - if (!_metadata->passed || _metadata->skip) \ - return; \ - _metadata->setup_completed = true; \ /* Use the same _metadata. */ \ child = vfork(); \ if (child == 0) { \ + fixture_name##_setup(_metadata, &self, variant->data); \ + /* Let setup failure terminate early. */ \ + if (!_metadata->passed || _metadata->skip) \ + _exit(0); \ + _metadata->setup_completed = true; \ fixture_name##_##test_name(_metadata, &self, variant->data); \ - _exit(0); \ - } \ - if (child < 0) { \ + } else if (child < 0 || child != waitpid(child, &status, 0)) { \ ksft_print_msg("ERROR SPAWNING TEST GRANDCHILD\n"); \ _metadata->passed = 0; \ } \ } \ - if (child == 0) \ - /* Child failed and updated the shared _metadata. */ \ + if (child == 0) { \ + if (_metadata->setup_completed && !_metadata->teardown_parent) \ + fixture_name##_teardown(_metadata, &self, variant->data); \ _exit(0); \ - if (_metadata->setup_completed) \ + } \ + if (_metadata->setup_completed && _metadata->teardown_parent) \ fixture_name##_teardown(_metadata, &self, variant->data); \ + if (!WIFEXITED(status) && WIFSIGNALED(status)) \ + /* Forward signal to __wait_for_test(). */ \ + kill(getpid(), WTERMSIG(status)); \ __test_check_assert(_metadata); \ } \ static struct __test_metadata \ @@ -414,6 +418,7 @@ .fixture = &_##fixture_name##_fixture_object, \ .termsig = signal, \ .timeout = tmout, \ + .teardown_parent = false, \ }; \ static void __attribute__((constructor)) \ _register_##fixture_name##_##test_name(void) \ @@ -842,6 +847,7 @@ struct __test_metadata { bool timed_out; /* did this test timeout instead of exiting? */ bool aborted; /* stopped test due to failed ASSERT */ bool setup_completed; /* did setup finish? */ + bool teardown_parent; /* run teardown in a parent process */ jmp_buf env; /* for exiting out of test early */ struct __test_results *results; struct __test_metadata *prev, *next; diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c index 2d6d9b43d958..1d5952897e05 100644 --- a/tools/testing/selftests/landlock/fs_test.c +++ b/tools/testing/selftests/landlock/fs_test.c @@ -285,6 +285,8 @@ static void prepare_layout_opt(struct __test_metadata *const _metadata, static void prepare_layout(struct __test_metadata *const _metadata) { + _metadata->teardown_parent = true; + prepare_layout_opt(_metadata, &mnt_tmp); } @@ -3861,9 +3863,7 @@ FIXTURE_SETUP(layout1_bind) FIXTURE_TEARDOWN(layout1_bind) { - set_cap(_metadata, CAP_SYS_ADMIN); - EXPECT_EQ(0, umount(dir_s2d2)); - clear_cap(_metadata, CAP_SYS_ADMIN); + /* umount(dir_s2d2)) is handled by namespace lifetime. */ remove_layout1(_metadata); @@ -4276,9 +4276,8 @@ FIXTURE_TEARDOWN(layout2_overlay) EXPECT_EQ(0, remove_path(lower_fl1)); EXPECT_EQ(0, remove_path(lower_do1_fo2)); EXPECT_EQ(0, remove_path(lower_fo1)); - set_cap(_metadata, CAP_SYS_ADMIN); - EXPECT_EQ(0, umount(LOWER_BASE)); - clear_cap(_metadata, CAP_SYS_ADMIN); + + /* umount(LOWER_BASE)) is handled by namespace lifetime. */ EXPECT_EQ(0, remove_path(LOWER_BASE)); EXPECT_EQ(0, remove_path(upper_do1_fu3)); @@ -4287,14 +4286,11 @@ FIXTURE_TEARDOWN(layout2_overlay) EXPECT_EQ(0, remove_path(upper_do1_fo2)); EXPECT_EQ(0, remove_path(upper_fo1)); EXPECT_EQ(0, remove_path(UPPER_WORK "/work")); - set_cap(_metadata, CAP_SYS_ADMIN); - EXPECT_EQ(0, umount(UPPER_BASE)); - clear_cap(_metadata, CAP_SYS_ADMIN); + + /* umount(UPPER_BASE)) is handled by namespace lifetime. */ EXPECT_EQ(0, remove_path(UPPER_BASE)); - set_cap(_metadata, CAP_SYS_ADMIN); - EXPECT_EQ(0, umount(MERGE_DATA)); - clear_cap(_metadata, CAP_SYS_ADMIN); + /* umount(MERGE_DATA)) is handled by namespace lifetime. */ EXPECT_EQ(0, remove_path(MERGE_DATA)); cleanup_layout(_metadata); @@ -4691,6 +4687,8 @@ FIXTURE_SETUP(layout3_fs) SKIP(return, "this filesystem is not supported (setup)"); } + _metadata->teardown_parent = true; + slash = strrchr(variant->file_path, '/'); ASSERT_NE(slash, NULL); dir_len = (size_t)slash - (size_t)variant->file_path; On Mon, Mar 04, 2024 at 08:31:49PM +0100, Mickaël Salaün wrote: > On Mon, Mar 04, 2024 at 08:27:50PM +0100, Mickaël Salaün wrote: > > Testing the whole series, I found that some Landlock tests are flaky > > starting with this patch. I tried to not use the longjmp in the > > grandchild but it didn't change. I suspect missing volatiles but I > > didn't find the faulty one(s) yet. :/ > > I'll continue investigating tomorrow but help would be much appreciated! > > The issue is with the fs_test.c, often starting with this one: > > # RUN layout1.relative_chroot_only ... > # fs_test.c:294:relative_chroot_only:Expected 0 (0) == umount(TMP_DIR) (-1) > # fs_test.c:296:relative_chroot_only:Expected 0 (0) == remove_path(TMP_DIR) (16) > # relative_chroot_only: Test failed > # FAIL layout1.relative_chroot_only > > ...or this one: > > # RUN layout3_fs.hostfs.tag_inode_dir_child ... > # fs_test.c:4707:tag_inode_dir_child:Expected 0 (0) == mkdir(self->dir_path, 0700) (-1) > # fs_test.c:4709:tag_inode_dir_child:Failed to create directory "tmp/dir": No such file or directory > # fs_test.c:4724:tag_inode_dir_child:Expected 0 (0) <= fd (-1) > # fs_test.c:4726:tag_inode_dir_child:Failed to create file "tmp/dir/file": No such file or directory > # fs_test.c:4729:tag_inode_dir_child:Expected 0 (0) == close(fd) (-1) > # tag_inode_dir_child: Test failed > # FAIL layout3_fs.hostfs.tag_inode_dir_child > This was because the vfork() wasn't followed by a wait(). > > > > > > > On Wed, Feb 28, 2024 at 04:59:09PM -0800, Jakub Kicinski wrote: > > > From: Mickaël Salaün <mic@xxxxxxxxxxx> > > > > > > Replace Landlock-specific TEST_F_FORK() with an improved TEST_F() which > > > brings four related changes: > > > > > > Run TEST_F()'s tests in a grandchild process to make it possible to > > > drop privileges and delegate teardown to the parent. > > > > > > Compared to TEST_F_FORK(), simplify handling of the test grandchild > > > process thanks to vfork(2), and makes it generic (e.g. no explicit > > > conversion between exit code and _metadata). > > > > > > Compared to TEST_F_FORK(), run teardown even when tests failed with an > > > assert thanks to commit 63e6b2a42342 ("selftests/harness: Run TEARDOWN > > > for ASSERT failures"). > > > > > > Simplify the test harness code by removing the no_print and step fields > > > which are not used. I added this feature just after I made > > > kselftest_harness.h more broadly available but this step counter > > > remained even though it wasn't needed after all. See commit 369130b63178 > > > ("selftests: Enhance kselftest_harness.h to print which assert failed"). > > > > > > Replace spaces with tabs in one line of __TEST_F_IMPL(). > > > > > > Cc: Günther Noack <gnoack@xxxxxxxxxx> > > > Cc: Shuah Khan <shuah@xxxxxxxxxx> > > > Cc: Will Drewry <wad@xxxxxxxxxxxx> > > > Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxx> > > > Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx> > > > Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx> > > > -- > > > v4: > > > - GAND -> GRAND > > > - init child to 1, otherwise assert in setup triggers a longjmp > > > which in turn reads child without it ever getting initialized > > > (or being 0, i.e. we mistakenly assume we're in the grandchild) > > > > Good catch!