Re: [PATCH v2 1/2] check: disable HAVE_PRIVATENS by default

Zorro Lang <zlang@xxxxxxxxxx> · Fri, 7 Mar 2025 23:47:22 +0800

On Fri, Mar 07, 2025 at 07:56:28AM +1100, Dave Chinner wrote:
> On Thu, Mar 06, 2025 at 09:47:23PM +0800, Zorro Lang wrote:
> > On Thu, Mar 06, 2025 at 10:05:13PM +1100, Dave Chinner wrote:
> > > On Thu, Mar 06, 2025 at 05:49:23PM +0800, Zorro Lang wrote:
> > > > Currently we have 3 ways to run a test case in _run_seq():
> > > > 
> > > >   if [ -n "${HAVE_PRIVATENS}" ]; then
> > > >       ./tools/run_privatens "./$seq"
> > > >       ...
> > > >   elif [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then
> > > >       systemd-run --quiet --unit "${unit}" --scope \
> > > >              ./tools/run_setsid "./$seq" &
> > > >       ...
> > > >   else
> > > >       ./tools/run_setsid "./$seq" &
> > > >       ...
> > > >   fi
> > > > 
> > > > The "privatens" way brings in some regressions. We need more time
> > > > to develop and test this way, it's not time let it to be the
> > > > first default choice, so isolate the HAVE_PRIVATENS initialization
> > > > by a TRY_PRIVATENS parameter, and disable it by default.
> > > > 
> > > > Set TRY_PRIVATENS=yes to give "privatens" a try, otherwise run in
> > > > old ways. This patch can be removed after "privatens" way is stable.
> > > > 
> > > > Signed-off-by: Zorro Lang <zlang@xxxxxxxxxx>
> > > > Reviewed-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>
> > > > ---
> > > >  check | 11 +++++++----
> > > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/check b/check
> > > > index ea92b0d62..cb2f19d08 100755
> > > > --- a/check
> > > > +++ b/check
> > > > @@ -674,10 +674,13 @@ _stash_test_status() {
> > > >  	esac
> > > >  }
> > > >  
> > > > -# Can we run in a private pid/mount namespace?
> > > > -HAVE_PRIVATENS=
> > > > -./tools/run_privatens bash -c "exit 77"
> > > > -test $? -eq 77 && HAVE_PRIVATENS=yes
> > > > +# Don't try "privatens" by default, it's experimental for now.
> > > > +if [ "$TRY_PRIVATENS" = "yes" ];then
> > > > +	# Can we run in a private pid/mount namespace?
> > > > +	HAVE_PRIVATENS=
> > > > +	./tools/run_privatens bash -c "exit 77"
> > > > +	test $? -eq 77 && HAVE_PRIVATENS=yes
> > > > +fi
> > > 
> > > Works for me - I have basically the same patch in my check-parallel
> > > stack because this breaks check-parallel mount namespacing for
> > > reasons I don't yet understand.
> > > 
> > > Creating a new mount namespace for each test that is run appears
> > > to turn the private mount namespace that each check instance is
> > > executed in back into a globally shared mount namespace inside
> > > each individual test mount namespace.
> > > 
> > > i.e. when check uses private namespaces I can see all the mounts
> > > from inside each test namespace from the init namespace and every
> > > test can see every mount that every other test runs again.
> > > 
> > > That is the problem I originally used mount namespaces in
> > > check-parallel to avoid.
> > > 
> > > I have no idea if this is how mount namespace nesting is actually
> > > supposed to work (seems completely broken to me!), but the only
> > > solution I've found that works so far is to turn off HAVE_PRIVATENS
> > > in check as it is redundant when it is run from check-parallel.
> > > 
> > > FWIW, given that check-parallel now runs in it's own private PID
> > > namespace with it's own /proc and /tmp, the original problem of
> > > needing to "confine pkill to only the child processes of this test
> > > instance" has gone away entirely. i.e. check-parallel does not need check to
> > > to restrict pkill to children of the current test being run anymore.
> > > 
> > > Hence I think we can probably remove the new process isolation
> > > shenanigans and revert the _pkill wrappers to plain pkill calls
> > > again as the private PID namespacing the check-parallel does means
> > > pkill does the right thing for both check and check-parallel now.
> > 
> > Hi Dave,
> > 
> > Thanks for your reviewing. Do you mean revert below commit directly?
> > 
> >   commit 247ab01fa227647c290cf82696f1b0bbf87a0177
> >   Author: Darrick J. Wong <djwong@xxxxxxxxxx>
> >   Date:   Mon Feb 3 14:00:28 2025 -0800
> > 
> >       check: run tests in a private pid/mount namespace
> > 
> > I isolate the HAVE_PRIVATENS rather than reverting it, due to I'm not
> > sure if you or Darrick still would like to improve that test way.
> 
> I'm talking about removing all the setsid and namspace bits that we
> added to try to solve the need for pkill --parent. i.e. this series
> of patches:
> 
> 949bdf8ea check: deprecate using process sessions to isolate test instances
> 247ab01fa check: run tests in a private pid/mount namespace
> 88d60f434 common: fix pkill by running test program in a separate session
> c71349150 common/rc: hoist pkill to a helper function
> 77548e606 common/rc: create a wrapper for the su command
> 
> Using private PID namespaces in check-parallel has solved the
> original problem that required the pkill --parent usage and the
> fsstress binary rename/copy workaround by wrapping process isolation
> outside of each check instance. Hence check and the internal
> check infrastructure doesn't need to ever care about process
> isolation to solve this check-parallel problem anymore.

If "check" doesn't need these namespace isolation things, that will be good
to me. That'll help to isolate the issues on check or on check-parallel.
And we won't bring in regressions to check when try to fix check-parallel.

Now let me merge this patch at first, to avoid the regressions on btrfs.
Then please feel free to send patches to remove/reduce the coupling of
two testing ways (check and check-parallel:)

> 
> i.e. the isolation fix should have been made to check-parallel,
> not to check.
> 
> It is not ideal to have to walk all this stuff back, but this is
> what happens when an RFC-scope feature prototype is merged without
> allowing adequate time for review or give the author a chance to
> address at least one round of review comments and bug fixes before
> it gets merged....

Sorry, I've learned from this. I thought I've given it lots of test,
and fixed all bug reports at that time. And I asked several times
if there's any objection about merging it in next release. As there's
not objection from anyone in 2 weeks, so I decided to give it a chance,
to save the super complicated rebasing workload... Anyway, next time,
for each complex patchset on infrastructure, I'll always give it 3+
release cycles time. And never merge it if without 3 acks from xfs,
ext4 and btrfs, as there're too many different test ways I can't test
cover. Sorry again for the mess this time, hope we can back to track
soon.

Thanks,
Zorro

> 
> -Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>