On Fri, Oct 11, 2024 at 9:51 AM Stephen Smalley <stephen.smalley.work@xxxxxxxxx> wrote: > Ok, I confirmed that the remaining denials are due to multiple tmpfs > mounts and a socket created by systemd-nspawn during setup of the > container that are then used by the container at runtime, and I > confirmed that allowing those permissions in the container policy > enables a Fedora container to boot in enforcing mode with its own > SELinux namespace on a Fedora host in enforcing mode. Ultimately we > will want the container runtime (systemd-nspawn in this case) to > properly label those tmpfs mounts and the socket but that's just a > matter of further userspace changes to systemd-nspawn. > > Still lots to do to allow more interesting combinations but I'll leave > it there for a bit and see if anyone is actually interested in this > besides me... As per the discussion at the project meeting, I have added a Kconfig option CONFIG_SECURITY_SELINUX_NS (default n) that controls whether the SELinux namespace support is exposed to userspace at all but does not affect the underlying infrastructure support. Hence, anyone wishing to experiment with it will need to enable that option. At this point, the safeguards on SELinux namespaces are as follows: - You have to explicitly enable it in Kconfig for it to be exposed to userspace at all by the kernel, - If enabled in Kconfig, the /sys/fs/selinux/unshare node for unsharing the SELinux namespace can only be written by processes that have the root UID (or CAP_DAC_OVERRIDE if non-root) and the new unshare SELinux permission (obviously on Fedora the latter is default-allowed unless you define the permission, but even then you still have to be root or CAP_DAC_OVERRIDE). - If enabled in Kconfig, then two additional Kconfig options and /sys/fs/selinux nodes are provided for specifying the maximum number of SELinux namespaces (default 65535) and the maximum depth to which they can be nested (default 32). The /sys/fs/selinux/{maxns,maxnsdepth} nodes can only be written by a process with the root uid (or CAP_DAC_OVERRIDE) and the new setmaxns/setmaxnsdepth SELinux permissions. Further, they can only be set from the initial SELinux namespace, not from child namespaces. Hopefully those safeguards remove any qualms people might have about testing. Would welcome any code reviewers or testers, especially for corner cases that I am less likely to exercise myself - e.g. policies not based on refpolicy, containers and/or host OSes that are not Fedora derivatives, etc. You'll need the patched kernel, libselinux, systemd-nspawn, and systemd (or roll your own userspace patches for your preferred container runtime and/or init daemon) to exercise it, as previously described in the thread.