On Thu, Oct 3, 2024 at 2:29 PM Stephen Smalley <stephen.smalley.work@xxxxxxxxx> wrote: > > On Thu, Oct 3, 2024 at 1:04 PM Stephen Smalley > <stephen.smalley.work@xxxxxxxxx> wrote: > > Based on our discussion at the last project meeting, I removed the > > requirement to unshare the network namespace when unsharing the > > SELinux namespace by adding a check in selnl_notify() to only send the > > SELinux netlink notifications to the init network namespace if the > > triggering process is in the init SELinux namespace. Hence, the > > creator of a child SELinux namespace can either choose to unshare the > > network namespace if they want to receive such netlink notifications > > (in which case they will be sent to that child network namespace > > only), or they can just use the SELinux status page exported by > > /sys/fs/selinux/status, which is the default in libselinux for kernels > > that support it. > > > > With that change, I can now run all of the selinux-testsuite tests > > successfully from a child SELinux namespace except for two labeled > > IPSEC tests each for inet_socket/tcp, inet_socket/udp, and > > inet_socket/mptcp. To fully pass the other tests, I had to also put > > the parent namespace into permissive mode to avoid certain failures > > due to MCS constraints in the base policy that can't be overridden via > > the test policy. The remaining labeled IPSEC test failures are likely > > due to the fact that the xfrm hooks are not passed a sock structure or > > anything else from which I can obtain the appropriate SELinux > > namespace to use so they are hardcoded to use the init SELinux > > namespace and even when it is permissive, there are hardcoded SID > > comparisons in those hooks that are likely failing. > > > > I also introduced configurable limits for the maximum number of > > SELinux namespaces and for the maximum depth to which they can be > > nested. The default values of each can be controlled via Kconfig > > options, which default to 65535 and 32 respectively (matching user > > namespaces), and can be further adjusted via /sys/fs/selinux/maxns and > > /sys/fs/selinux/maxnsdepth respectively but only from the init SELinux > > namespace (child namespaces can read but not modify them). A simple > > pair of test scripts to recursively create SELinux namespaces > > correctly failed when it hit the maxnsdepth and lowering the maxns > > value correctly prevented exceeding that number of total namespaces. > > These tests however exposed a couple of reference counting bugs in the > > code (one for SELinux namespaces, one for the parent cred that we > > cache in the task security blob for use in checks on the parent > > namespace), which are now also fixed. > > > > I have completed converting all of the permission checks to use the > > namespace-aware helpers or annotated them with comments indicating > > when it is correct to only check against the current SELinux > > namespace. For some of the checks, it is debatable as to which helper > > should be used, so we may need to revisit some of these based on > > experience. > > > > What remains to be done: > > 1. Maybe rework how policy capabilities are being checked/used to > > correctly support child namespaces with different policy capabilities > > from the parent. I can do this for some simple cases by lifting the > > logic to walk the namespaces up into the hook function itself and > > checking the policy capability value in each namespace, but many > > (most?) of the policy capabilities don't lend themselves to this > > approach. For example, extended_socket_class enables finer-grained > > socket security classes, but this is checked and applied when the > > socket security blob is initialized, not at permission check time. > > Unless we want to move the mapping logic to every permission check, I > > am not sure what can be done there. Similarly, a number of policy > > capabilities control labeling behaviors rather than permission checks, > > and since we are no longer trying to support per-namespace object > > SIDs/contexts, only one namespace's policy can be applied that label > > will then be used for all subsequent checks even in the other > > namespaces. > > > > 2. Decide if any further hardening of selinuxfs is required to safely > > permit usage by potentially untrusted / less trusted processes in > > child namespaces. There has already been a lot of work to harden e.g. > > the policy loading logic against ill-formed policies and such, so not > > sure if there is much to do here, but noting it. I would like to get > > rid of /sys/fs/selinux/user altogether so possibly making it > > inaccessible in child namespaces would be a good first step. > > > > 3. Optimize the implementation for the single SELinux namespace case, > > reducing and/or eliminating the overhead introduced by the SELinux > > namespace support for that common case. Lots of work to do here, help > > welcome. Also would appreciate guidance on current benchmarking > > practices since it has been a while since I've had to do that. > > > > 4. Revisit the userspace API for unsharing the SELinux namespace > > if/when the rest is ready. Currently just "echo 1 > > > /sys/fs/selinux/unshare" (followed by the other necessary steps for > > unsharing the mount namespace, unmounting the parent's selinuxfs, > > mounting a new selinuxfs for the child, loading a policy, and setting > > enforcing mode). Options would include adding a CLONE_SECURITY flag to > > unshare/clone that could be implemented by any/all LSMs via a call to > > a new (stacked) LSM hook function, or one or more new LSM system calls > > to do the same, or just keeping it the way it is via selinuxfs. > > > > Experimentation is welcome, particularly for more complex cases, e.g. > > where the host policy and the child policy differ (no policy loaded on > > host, policy in child; policy loaded on host, no policy in child; host > > policy from one distribution/release; child from another, etc). Be > > aware however that since the permission checks are applied to the > > current namespace and its ancestors, the parent namespace may deny > > something that would be allowed in the child, especially if the child > > is using contexts that are unknown to the parent's policy (which will > > be treated as unlabeled for those checks in the parent). Also be aware > > that since we are no longer trying to support per-namespace object > > SIDs/contexts, any object first instantiated in the parent namespace > > will be labeled according to its policy, not the child's policy. > > > > The tree can be found at: > > https://github.com/stephensmalley/selinux-kernel/tree/working-selinuxns > > > > It may be re-based or changed at any time. > > To experiment, after building and booting this kernel, do the following: > > # Create root shell > > sudo bash > > # Unshare SELinux namespace > > echo 1 > /sys/fs/selinux/unshare > > id # Context is now "init" or "kernel" in child; ps -eZ from parent > > will still show original context > > # Unshare mount namespace and mount new selinuxfs for child SELinux namespace > > unshare -m > > umount /sys/fs/selinux > > mount -t selinuxfs none /sys/fs/selinux > > # Load a policy into the child SELinux namespace, parent unaffected > > load_policy > > id # Context is now kernel_generic_helper_t on Fedora due to a default > > transition in its policy > > # Switch to a suitable security context before trying to go enforcing > > runcon unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 /bin/bash > > # Switch child to enforcing, checking that you didn't get killed once enforcing > > echo $$ > > setenforce 1 > > echo $$ > > # Do stuff in child, run testsuite (switch parent to permissive first > > to avoid denials from it), etc. > > Oops, I see that the selinux tree re-based to 6.12-rc1, so now > updating my branch to that. > There are conflicts so it may take a little bit. Wasn't too bad. Now re-based on 6.12-rc1.