Re: [PATCH nft] tests/shell: allow running tests as non-root users

Thomas Haller <thaller@xxxxxxxxxx> · Thu, 31 Aug 2023 19:26:15 +0200

On Thu, 2023-08-31 at 18:08 +0200, Florian Westphal wrote:
> Thomas Haller <thaller@xxxxxxxxxx> wrote:
> > Allow to opt-out from the have-real-root check via
> > 
> >   NFT_TEST_ROOTLESS=1 ./run-tests.sh
> 
> I don't like this.  But its a step in the right direction.
> 
> To me run-tests.sh has following issues/pain points:
>  - test duration is huge (>10m with debug kernels)
>  - all tests run in same netns
>  - tries to unloads kernel modules after each test
> 
> The need for uid 0 wasn't big on my problem list so far because
> I mostly run the tests in a VM.  But I agree its an issue for
> auto-build systems / CI and the like.

It also means there is not simple "run-tests" script without setting up
a VM/container. I personally avoid to run `sudo some/script` from the
internet (although, I execute build scripts from the internet are
normal user).

Running a substantial subset of tests as non-root, seems essential to
me.

It may not be essential to everybody, but to some.

> 
> > For that to be useful, we must also unshare the PID and user
> > namespace
> > and map the root user inside that namespace.
> 
> Are you sure PIDNS unshare is needed for this?

Probably not, but does it hurt? I think it should be

  unshare -f -p -U -n --mount-proc --map-root-user \
      bash -xc 'whoami; ip link; sleep 1 & ps aux'

> 
> > Test that don't work without real root should check for
> > [ "$NFT_TEST_HAVE_REALROOT" != 1 ] and skip gracefully.
> 
> Thats fine, see my recent RFC to add such environment
> variables to check if a particular feature is supported or not.
> 
> What I don't like here is the NFT_TEST_ROOTLESS environment
> variable to alter behaviour of run-tests.sh behavior, but see below.

If you don't run with real-root, then you are maybe not testing the
full thing and miss something. Hence, you have to opt-in to the new
rootless functionality with NFT_TEST_ROOTLESS=1.

The check is to preserve the previous behavior, which failed without
real-root.

> > -if [ "$(id -u)" != "0" ] ; then
> > +if [ "$NFT_TEST_HAVE_REALROOT" = "" ] ; then
> > +       # The caller can set NFT_TEST_HAVE_REALROOT to indicate us
> > whether we
> > +       # have real root. They usually don't need, and we detect it
> > now based
> > +       # on `id -u`. Note that we may unshare below, so the check
> > inside the
> > +       # new namespace won't be conclusive. We thus only detect
> > once and export
> > +       # the result.
> > +       export NFT_TEST_HAVE_REALROOT="$(test "$(id -u)" = "0" &&
> > echo 1 || echo 0)"
> > +fi
> > +
> 
> Why not get rid of the check?  Just auto-switch to unpriv userns and
> error out if that fails.  You could just print a warning/notice here
> and
> then try userns mode.  And/or print a notice at the together with the
> test summary.

Which check? About NFT_TEST_HAVE_REALROOT?

We need that, to detect real-root before unshare. Since we already need
it internally, it's also documented so that the user could override it.
For example, if you develop inside a

     podman run --privileged -ti fedora:latest

then `id -u` would wrongly indicate that the test has real-root. You
can override that with `export  NFT_TEST_HAVE_REALROOT=0` to run in
rootless mode.

> 
> > +if [ "$NFT_TEST_NO_UNSHARE" = 1 ]; then
> > +       # The user opts-out from unshare. Proceed without.
> 
> Whats the use case?  If there is a good one, then i'd prefer a
> command
> line switch rather than environment.

I think command line switches are inferior. They are cumbersome to
implement (in the script, the order of options surprisingly matters).
They are also more limited in their usage. The script should integrate
with a `make check-xyz` step or be called from others scripts, where
the command line is not directly editable. Also, the name
NFT_TEST_NO_UNSHARE is something unique that you can grep for (unlike
the "-g" command line switch).

The variable NFT_TEST_NO_UNSHARE is there, because the previous code
also expected a failure to `unshare`, then printed a warning and
proceeded without separate namespace. I think that fallback is
undesirable, I would not want to run the test accidentally in the main
namespace.

Now, by default it always tries to unshare, and aborts on failure. The
variable is there to opt-out from that.

> 
> I think long term all of the following would be good to have:
> 
> 1. run each test in its own netns
> 2. get rid of the forced 'nft flush ruleset' and the rmmod calls
> 3. Explore parallelisation of tests to reduce total test time
> 4. Add a SKIP return value, that tells that the test did not run
>   (or some other means that allows run-tests.sh to figure out that
>    a particular test did not run because its known to not work on
>    current configuration).
> 
> This would avoid false-positive 'all tests passed' when in reality
> some test had to 'exit 0' because of a missing feature or lack of
> real
> root.
> 
> Alternatively we could just make these tests fail and leave it to the
> user to figure it out, the normal expectation is for all tests to
> pass,
> its mostly when run-tests.sh is run on older kernel releases when it
> starts acting up.
> 

These are very good points.

Parallel execution maybe happen by hooking the tests up as individual
make targets. 

I'd like to integrate tests more into `make check-*`. I'd also like to
add unit tests (that is, both statically and dynamically link with
libnftables and to be able to unit test code specifically). As a
developer, I'd like to type `make check` to run the the unit tests and
have a clear make target for the integration tests.

IMO "[PATCH nft 0/6] no recursive make" should be done to make that
nicer. But I guess that will be controversial. Recursive make seems
very popular, all around.

Thomas