Re: SELinux namespaces re-base

Stephen Smalley <stephen.smalley.work@xxxxxxxxx> · Thu, 3 Oct 2024 16:11:30 -0400

On Thu, Oct 3, 2024 at 2:29 PM Stephen Smalley
<stephen.smalley.work@xxxxxxxxx> wrote:
>
> On Thu, Oct 3, 2024 at 1:04 PM Stephen Smalley
> <stephen.smalley.work@xxxxxxxxx> wrote:
> > Based on our discussion at the last project meeting, I removed the
> > requirement to unshare the network namespace when unsharing the
> > SELinux namespace by adding a check in selnl_notify() to only send the
> > SELinux netlink notifications to the init network namespace if the
> > triggering process is in the init SELinux namespace. Hence, the
> > creator of a child SELinux namespace can either choose to unshare the
> > network namespace if they want to receive such netlink notifications
> > (in which case they will be sent to that child network namespace
> > only), or they can just use the SELinux status page exported by
> > /sys/fs/selinux/status, which is the default in libselinux for kernels
> > that support it.
> >
> > With that change, I can now run all of the selinux-testsuite tests
> > successfully from a child SELinux namespace except for two labeled
> > IPSEC tests each for inet_socket/tcp, inet_socket/udp, and
> > inet_socket/mptcp. To fully pass the other tests, I had to also put
> > the parent namespace into permissive mode to avoid certain failures
> > due to MCS constraints in the base policy that can't be overridden via
> > the test policy. The remaining labeled IPSEC test failures are likely
> > due to the fact that the xfrm hooks are not passed a sock structure or
> > anything else from which I can obtain the appropriate SELinux
> > namespace to use so they are hardcoded to use the init SELinux
> > namespace and even when it is permissive, there are hardcoded SID
> > comparisons in those hooks that are likely failing.
> >
> > I also introduced configurable limits for the maximum number of
> > SELinux namespaces and for the maximum depth to which they can be
> > nested. The default values of each can be controlled via Kconfig
> > options, which default to 65535 and 32 respectively (matching user
> > namespaces), and can be further adjusted via /sys/fs/selinux/maxns and
> > /sys/fs/selinux/maxnsdepth respectively but only from the init SELinux
> > namespace (child namespaces can read but not modify them). A simple
> > pair of test scripts to recursively create SELinux namespaces
> > correctly failed when it hit the maxnsdepth and lowering the maxns
> > value correctly prevented exceeding that number of total namespaces.
> > These tests however exposed a couple of reference counting bugs in the
> > code (one for SELinux namespaces, one for the parent cred that we
> > cache in the task security blob for use in checks on the parent
> > namespace), which are now also fixed.
> >
> > I have completed converting all of the permission checks to use the
> > namespace-aware helpers or annotated them with comments indicating
> > when it is correct to only check against the current SELinux
> > namespace. For some of the checks, it is debatable as to which helper
> > should be used, so we may need to revisit some of these based on
> > experience.
> >
> > What remains to be done:
> > 1. Maybe rework how policy capabilities are being checked/used to
> > correctly support child namespaces with different policy capabilities
> > from the parent. I can do this for some simple cases by lifting the
> > logic to walk the namespaces up into the hook function itself and
> > checking the policy capability value in each namespace, but many
> > (most?) of the policy capabilities don't lend themselves to this
> > approach. For example, extended_socket_class enables finer-grained
> > socket security classes, but this is checked and applied when the
> > socket security blob is initialized, not at permission check time.
> > Unless we want to move the mapping logic to every permission check, I
> > am not sure what can be done there. Similarly, a number of policy
> > capabilities control labeling behaviors rather than permission checks,
> > and since we are no longer trying to support per-namespace object
> > SIDs/contexts, only one namespace's policy can be applied that label
> > will then be used for all subsequent checks even in the other
> > namespaces.
> >
> > 2. Decide if any further hardening of selinuxfs is required to safely
> > permit usage by potentially untrusted / less trusted processes in
> > child namespaces. There has already been a lot of work to harden e.g.
> > the policy loading logic against ill-formed policies and such, so not
> > sure if there is much to do here, but noting it. I would like to get
> > rid of /sys/fs/selinux/user altogether so possibly making it
> > inaccessible in child namespaces would be a good first step.
> >
> > 3. Optimize the implementation for the single SELinux namespace case,
> > reducing and/or eliminating the overhead introduced by the SELinux
> > namespace support for that common case. Lots of work to do here, help
> > welcome. Also would appreciate guidance on current benchmarking
> > practices since it has been a while since I've had to do that.
> >
> > 4. Revisit the userspace API for unsharing the SELinux namespace
> > if/when the rest is ready. Currently just "echo 1 >
> > /sys/fs/selinux/unshare" (followed by the other necessary steps for
> > unsharing the mount namespace, unmounting the parent's selinuxfs,
> > mounting a new selinuxfs for the child, loading a policy, and setting
> > enforcing mode). Options would include adding a CLONE_SECURITY flag to
> > unshare/clone that could be implemented by any/all LSMs via a call to
> > a new (stacked) LSM hook function, or one or more new LSM system calls
> > to do the same, or just keeping it the way it is via selinuxfs.
> >
> > Experimentation is welcome, particularly for more complex cases, e.g.
> > where the host policy and the child policy differ (no policy loaded on
> > host, policy in child; policy loaded on host, no policy in child; host
> > policy from one distribution/release; child from another, etc). Be
> > aware however that since the permission checks are applied to the
> > current namespace and its ancestors, the parent namespace may deny
> > something that would be allowed in the child, especially if the child
> > is using contexts that are unknown to the parent's policy (which will
> > be treated as unlabeled for those checks in the parent). Also be aware
> > that since we are no longer trying to support per-namespace object
> > SIDs/contexts, any object first instantiated in the parent namespace
> > will be labeled according to its policy, not the child's policy.
> >
> > The tree can be found at:
> > https://github.com/stephensmalley/selinux-kernel/tree/working-selinuxns
> >
> > It may be re-based or changed at any time.
> > To experiment, after building and booting this kernel, do the following:
> > # Create root shell
> > sudo bash
> > # Unshare SELinux namespace
> > echo 1 > /sys/fs/selinux/unshare
> > id # Context is now "init" or "kernel" in child; ps -eZ from parent
> > will still show original context
> > # Unshare mount namespace and mount new selinuxfs for child SELinux namespace
> > unshare -m
> > umount /sys/fs/selinux
> > mount -t selinuxfs none /sys/fs/selinux
> > # Load a policy into the child SELinux namespace, parent unaffected
> > load_policy
> > id # Context is now kernel_generic_helper_t on Fedora due to a default
> > transition in its policy
> > # Switch to a suitable security context before trying to go enforcing
> > runcon unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 /bin/bash
> > # Switch child to enforcing, checking that you didn't get killed once enforcing
> > echo $$
> > setenforce 1
> > echo $$
> > # Do stuff in child, run testsuite (switch parent to permissive first
> > to avoid denials from it), etc.
>
> Oops, I see that the selinux tree re-based to 6.12-rc1, so now
> updating my branch to that.
> There are conflicts so it may take a little bit.

Wasn't too bad. Now re-based on 6.12-rc1.