Re: SELinux namespaces re-base

Stephen Smalley <stephen.smalley.work@xxxxxxxxx> · Fri, 25 Oct 2024 11:48:21 -0400

On Fri, Oct 11, 2024 at 9:51 AM Stephen Smalley
<stephen.smalley.work@xxxxxxxxx> wrote:
> Ok, I confirmed that the remaining denials are due to multiple tmpfs
> mounts and a socket created by systemd-nspawn during setup of the
> container that are then used by the container at runtime, and I
> confirmed that allowing those permissions in the container policy
> enables a Fedora container to boot in enforcing mode with its own
> SELinux namespace on a Fedora host in enforcing mode. Ultimately we
> will want the container runtime (systemd-nspawn in this case) to
> properly label those tmpfs mounts and the socket but that's just a
> matter of further userspace changes to systemd-nspawn.
>
> Still lots to do to allow more interesting combinations but I'll leave
> it there for a bit and see if anyone is actually interested in this
> besides me...

As per the discussion at the project meeting, I have added a Kconfig
option CONFIG_SECURITY_SELINUX_NS (default n) that controls whether
the SELinux namespace support is exposed to userspace at all but does
not affect the underlying infrastructure support.
Hence, anyone wishing to experiment with it will need to enable that
option. At this point, the safeguards on SELinux namespaces are as
follows:
- You have to explicitly enable it in Kconfig for it to be exposed to
userspace at all by the kernel,
- If enabled in Kconfig, the /sys/fs/selinux/unshare node for
unsharing the SELinux namespace can only be written by processes that
have the root UID (or CAP_DAC_OVERRIDE if non-root) and the new
unshare SELinux permission (obviously on Fedora the latter is
default-allowed unless you define the permission, but even then you
still have to be root or CAP_DAC_OVERRIDE).
- If enabled in Kconfig, then two additional Kconfig options and
/sys/fs/selinux nodes are provided for specifying the maximum number
of SELinux namespaces (default 65535) and the maximum depth to which
they can be nested (default 32). The
/sys/fs/selinux/{maxns,maxnsdepth} nodes can only be written by a
process with the root uid (or CAP_DAC_OVERRIDE) and the new
setmaxns/setmaxnsdepth SELinux permissions. Further,  they can only be
set from the initial SELinux namespace, not from child namespaces.

Hopefully those safeguards remove any qualms people might have about testing.

Would welcome any code reviewers or testers, especially for corner
cases that I am less likely to exercise myself - e.g. policies not
based on refpolicy, containers and/or host OSes that are not Fedora
derivatives, etc. You'll need the patched kernel, libselinux,
systemd-nspawn, and systemd (or roll your own userspace patches for
your preferred container runtime and/or init daemon) to exercise it,
as previously described in the thread.