On Fri, Mar 07, 2025 at 07:56:28AM +1100, Dave Chinner wrote: > On Thu, Mar 06, 2025 at 09:47:23PM +0800, Zorro Lang wrote: > > On Thu, Mar 06, 2025 at 10:05:13PM +1100, Dave Chinner wrote: > > > On Thu, Mar 06, 2025 at 05:49:23PM +0800, Zorro Lang wrote: > > > > Currently we have 3 ways to run a test case in _run_seq(): > > > > > > > > if [ -n "${HAVE_PRIVATENS}" ]; then > > > > ./tools/run_privatens "./$seq" > > > > ... > > > > elif [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then > > > > systemd-run --quiet --unit "${unit}" --scope \ > > > > ./tools/run_setsid "./$seq" & > > > > ... > > > > else > > > > ./tools/run_setsid "./$seq" & > > > > ... > > > > fi > > > > > > > > The "privatens" way brings in some regressions. We need more time > > > > to develop and test this way, it's not time let it to be the > > > > first default choice, so isolate the HAVE_PRIVATENS initialization > > > > by a TRY_PRIVATENS parameter, and disable it by default. > > > > > > > > Set TRY_PRIVATENS=yes to give "privatens" a try, otherwise run in > > > > old ways. This patch can be removed after "privatens" way is stable. > > > > > > > > Signed-off-by: Zorro Lang <zlang@xxxxxxxxxx> > > > > Reviewed-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> > > > > --- > > > > check | 11 +++++++---- > > > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/check b/check > > > > index ea92b0d62..cb2f19d08 100755 > > > > --- a/check > > > > +++ b/check > > > > @@ -674,10 +674,13 @@ _stash_test_status() { > > > > esac > > > > } > > > > > > > > -# Can we run in a private pid/mount namespace? > > > > -HAVE_PRIVATENS= > > > > -./tools/run_privatens bash -c "exit 77" > > > > -test $? -eq 77 && HAVE_PRIVATENS=yes > > > > +# Don't try "privatens" by default, it's experimental for now. > > > > +if [ "$TRY_PRIVATENS" = "yes" ];then > > > > + # Can we run in a private pid/mount namespace? > > > > + HAVE_PRIVATENS= > > > > + ./tools/run_privatens bash -c "exit 77" > > > > + test $? -eq 77 && HAVE_PRIVATENS=yes > > > > +fi > > > > > > Works for me - I have basically the same patch in my check-parallel > > > stack because this breaks check-parallel mount namespacing for > > > reasons I don't yet understand. > > > > > > Creating a new mount namespace for each test that is run appears > > > to turn the private mount namespace that each check instance is > > > executed in back into a globally shared mount namespace inside > > > each individual test mount namespace. > > > > > > i.e. when check uses private namespaces I can see all the mounts > > > from inside each test namespace from the init namespace and every > > > test can see every mount that every other test runs again. > > > > > > That is the problem I originally used mount namespaces in > > > check-parallel to avoid. > > > > > > I have no idea if this is how mount namespace nesting is actually > > > supposed to work (seems completely broken to me!), but the only > > > solution I've found that works so far is to turn off HAVE_PRIVATENS > > > in check as it is redundant when it is run from check-parallel. > > > > > > FWIW, given that check-parallel now runs in it's own private PID > > > namespace with it's own /proc and /tmp, the original problem of > > > needing to "confine pkill to only the child processes of this test > > > instance" has gone away entirely. i.e. check-parallel does not need check to > > > to restrict pkill to children of the current test being run anymore. > > > > > > Hence I think we can probably remove the new process isolation > > > shenanigans and revert the _pkill wrappers to plain pkill calls > > > again as the private PID namespacing the check-parallel does means > > > pkill does the right thing for both check and check-parallel now. > > > > Hi Dave, > > > > Thanks for your reviewing. Do you mean revert below commit directly? > > > > commit 247ab01fa227647c290cf82696f1b0bbf87a0177 > > Author: Darrick J. Wong <djwong@xxxxxxxxxx> > > Date: Mon Feb 3 14:00:28 2025 -0800 > > > > check: run tests in a private pid/mount namespace > > > > I isolate the HAVE_PRIVATENS rather than reverting it, due to I'm not > > sure if you or Darrick still would like to improve that test way. > > I'm talking about removing all the setsid and namspace bits that we > added to try to solve the need for pkill --parent. i.e. this series > of patches: > > 949bdf8ea check: deprecate using process sessions to isolate test instances > 247ab01fa check: run tests in a private pid/mount namespace > 88d60f434 common: fix pkill by running test program in a separate session > c71349150 common/rc: hoist pkill to a helper function > 77548e606 common/rc: create a wrapper for the su command > > Using private PID namespaces in check-parallel has solved the > original problem that required the pkill --parent usage and the > fsstress binary rename/copy workaround by wrapping process isolation > outside of each check instance. Hence check and the internal > check infrastructure doesn't need to ever care about process > isolation to solve this check-parallel problem anymore. If "check" doesn't need these namespace isolation things, that will be good to me. That'll help to isolate the issues on check or on check-parallel. And we won't bring in regressions to check when try to fix check-parallel. Now let me merge this patch at first, to avoid the regressions on btrfs. Then please feel free to send patches to remove/reduce the coupling of two testing ways (check and check-parallel:) > > i.e. the isolation fix should have been made to check-parallel, > not to check. > > It is not ideal to have to walk all this stuff back, but this is > what happens when an RFC-scope feature prototype is merged without > allowing adequate time for review or give the author a chance to > address at least one round of review comments and bug fixes before > it gets merged.... Sorry, I've learned from this. I thought I've given it lots of test, and fixed all bug reports at that time. And I asked several times if there's any objection about merging it in next release. As there's not objection from anyone in 2 weeks, so I decided to give it a chance, to save the super complicated rebasing workload... Anyway, next time, for each complex patchset on infrastructure, I'll always give it 3+ release cycles time. And never merge it if without 3 acks from xfs, ext4 and btrfs, as there're too many different test ways I can't test cover. Sorry again for the mess this time, hope we can back to track soon. Thanks, Zorro > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >