On Mon, Sep 30, 2024 at 3:06 PM Stephen Smalley <stephen.smalley.work@xxxxxxxxx> wrote: > > On Mon, Sep 30, 2024 at 2:12 PM Topi Miettinen <toiwoton@xxxxxxxxx> wrote: > > > > Hi, > > > > I wonder if SELinux namespaces could be used for sandboxing, > > specifically with systemd. When enabled for a service with a directive > > (something like NamespacedSELinuxPolicy=path), PID1 could load a service > > specific namespaced policy and apply it to the service as it starts. > > These kind of policies could be extremely minimal and hardened when > > optimized. > > > > The implementation should avoid interfering with other sandboxing > > activities and also avoid AVC pollution from them, so preferably there > > should be a way to set up the namespacing and load the policy in a way > > that these will only take effect at next execve() call, much like > > setexeccon(). However, errors should be returned as early as possible > > though so that the error can be associated with the loading. Also it > > should be possible to enable SELinux namespacing independently to other > > namespacing options as they are controlled by other directives. > > > > Would this be an interesting use case? Would it need major design > > changes? Systemd already loads a SELinux policy at boot so there's some > > infrastructure in place. > > I don't think there is anything in the current implementation that > would preclude such usage, but I'm not sure that's a major use case > for the SELinux namespace support - sounds more like you want to apply > Landlock or similar sandboxing via systemd configuration. > > At present, the unshare operation is not deferred to the next > execve(), no different than any of the other namespace unshare > operations, but that's easy to do if it is necessary for some reason. > The current sequence as I've sketched in this email thread is to > unshare the SELinux namespace, mount your own private selinuxfs > instance that only affects your policy, load a policy, set enforcing > mode, and switch to an appropriate security context in the child - > either via setcon(3) or execve(). The policy and AVC are private to > your namespace. Permissions are checked against the current namespace > and all ancestors (for the checks that I have converted thus far, > still WIP). The process context in the child is separate/independent > of the context in the parent, but bounded in permissions by it. Also, to be clear, the usage model above is optimal for use when you want to essentially run a SELinux container with its own policy on a host OS that either does not itself load a SELinux policy at all or loads its own different policy. Then you'd just unshare the SELinux namespace (along with at least the mount and network namespaces for reasons previously described), umount the old /sys/fs/selinux that refers to the host OS policy from your mount namespace, and then run systemd/init and have it do what it normally does (i.e. mount its own selinuxfs, load a policy, set enforcing mode, switch contexts).