Re: SELinux namespaces re-base

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 27, 2024 at 10:48 AM Stephen Smalley
<stephen.smalley.work@xxxxxxxxx> wrote:
> Since an increasing number of the testsuite tests are failing in a
> child SELinux namespace due to even unconfined_t in the parent
> namespace not being allowed the requisite permissions in the parent
> namespace, I've created a modified version of the testsuite policy to
> allow those permissions to unconfined_t and also disabled the tests
> that cannot work currently due to the separate network namespace.
> Those changes are on a branch of my fork of the selinux-testsuite at:
> https://github.com/stephensmalley/selinux-testsuite/tree/selinuxns
>
> With those changes, if I load the test policy into the parent
> namespace (so that the test domains/types are defined and access is
> allowed to unconfined_t) and then create a child namespace from an
> unconfined_t shell and run the testsuite from it, all of the
> (still-enabled) tests pass. I'll keep amending the test policy on that
> branch with further changes as I convert additional permission checks
> to be namespace-aware. Eventually we can figure out if it makes sense
> to merge these into the main testsuite but that can wait until we're
> ready to merge the kernel namespace support itself.

Based on our discussion at the last project meeting, I removed the
requirement to unshare the network namespace when unsharing the
SELinux namespace by adding a check in selnl_notify() to only send the
SELinux netlink notifications to the init network namespace if the
triggering process is in the init SELinux namespace. Hence, the
creator of a child SELinux namespace can either choose to unshare the
network namespace if they want to receive such netlink notifications
(in which case they will be sent to that child network namespace
only), or they can just use the SELinux status page exported by
/sys/fs/selinux/status, which is the default in libselinux for kernels
that support it.

With that change, I can now run all of the selinux-testsuite tests
successfully from a child SELinux namespace except for two labeled
IPSEC tests each for inet_socket/tcp, inet_socket/udp, and
inet_socket/mptcp. To fully pass the other tests, I had to also put
the parent namespace into permissive mode to avoid certain failures
due to MCS constraints in the base policy that can't be overridden via
the test policy. The remaining labeled IPSEC test failures are likely
due to the fact that the xfrm hooks are not passed a sock structure or
anything else from which I can obtain the appropriate SELinux
namespace to use so they are hardcoded to use the init SELinux
namespace and even when it is permissive, there are hardcoded SID
comparisons in those hooks that are likely failing.

I also introduced configurable limits for the maximum number of
SELinux namespaces and for the maximum depth to which they can be
nested. The default values of each can be controlled via Kconfig
options, which default to 65535 and 32 respectively (matching user
namespaces), and can be further adjusted via /sys/fs/selinux/maxns and
/sys/fs/selinux/maxnsdepth respectively but only from the init SELinux
namespace (child namespaces can read but not modify them). A simple
pair of test scripts to recursively create SELinux namespaces
correctly failed when it hit the maxnsdepth and lowering the maxns
value correctly prevented exceeding that number of total namespaces.
These tests however exposed a couple of reference counting bugs in the
code (one for SELinux namespaces, one for the parent cred that we
cache in the task security blob for use in checks on the parent
namespace), which are now also fixed.

I have completed converting all of the permission checks to use the
namespace-aware helpers or annotated them with comments indicating
when it is correct to only check against the current SELinux
namespace. For some of the checks, it is debatable as to which helper
should be used, so we may need to revisit some of these based on
experience.

What remains to be done:
1. Maybe rework how policy capabilities are being checked/used to
correctly support child namespaces with different policy capabilities
from the parent. I can do this for some simple cases by lifting the
logic to walk the namespaces up into the hook function itself and
checking the policy capability value in each namespace, but many
(most?) of the policy capabilities don't lend themselves to this
approach. For example, extended_socket_class enables finer-grained
socket security classes, but this is checked and applied when the
socket security blob is initialized, not at permission check time.
Unless we want to move the mapping logic to every permission check, I
am not sure what can be done there. Similarly, a number of policy
capabilities control labeling behaviors rather than permission checks,
and since we are no longer trying to support per-namespace object
SIDs/contexts, only one namespace's policy can be applied that label
will then be used for all subsequent checks even in the other
namespaces.

2. Decide if any further hardening of selinuxfs is required to safely
permit usage by potentially untrusted / less trusted processes in
child namespaces. There has already been a lot of work to harden e.g.
the policy loading logic against ill-formed policies and such, so not
sure if there is much to do here, but noting it. I would like to get
rid of /sys/fs/selinux/user altogether so possibly making it
inaccessible in child namespaces would be a good first step.

3. Optimize the implementation for the single SELinux namespace case,
reducing and/or eliminating the overhead introduced by the SELinux
namespace support for that common case. Lots of work to do here, help
welcome. Also would appreciate guidance on current benchmarking
practices since it has been a while since I've had to do that.

4. Revisit the userspace API for unsharing the SELinux namespace
if/when the rest is ready. Currently just "echo 1 >
/sys/fs/selinux/unshare" (followed by the other necessary steps for
unsharing the mount namespace, unmounting the parent's selinuxfs,
mounting a new selinuxfs for the child, loading a policy, and setting
enforcing mode). Options would include adding a CLONE_SECURITY flag to
unshare/clone that could be implemented by any/all LSMs via a call to
a new (stacked) LSM hook function, or one or more new LSM system calls
to do the same, or just keeping it the way it is via selinuxfs.

Experimentation is welcome, particularly for more complex cases, e.g.
where the host policy and the child policy differ (no policy loaded on
host, policy in child; policy loaded on host, no policy in child; host
policy from one distribution/release; child from another, etc). Be
aware however that since the permission checks are applied to the
current namespace and its ancestors, the parent namespace may deny
something that would be allowed in the child, especially if the child
is using contexts that are unknown to the parent's policy (which will
be treated as unlabeled for those checks in the parent). Also be aware
that since we are no longer trying to support per-namespace object
SIDs/contexts, any object first instantiated in the parent namespace
will be labeled according to its policy, not the child's policy.

The tree can be found at:
https://github.com/stephensmalley/selinux-kernel/tree/working-selinuxns

It may be re-based or changed at any time.
To experiment, after building and booting this kernel, do the following:
# Create root shell
sudo bash
# Unshare SELinux namespace
echo 1 > /sys/fs/selinux/unshare
id # Context is now "init" or "kernel" in child; ps -eZ from parent
will still show original context
# Unshare mount namespace and mount new selinuxfs for child SELinux namespace
unshare -m
umount /sys/fs/selinux
mount -t selinuxfs none /sys/fs/selinux
# Load a policy into the child SELinux namespace, parent unaffected
load_policy
id # Context is now kernel_generic_helper_t on Fedora due to a default
transition in its policy
# Switch to a suitable security context before trying to go enforcing
runcon unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 /bin/bash
# Switch child to enforcing, checking that you didn't get killed once enforcing
echo $$
setenforce 1
echo $$
# Do stuff in child, run testsuite (switch parent to permissive first
to avoid denials from it), etc.





[Index of Archives]     [Selinux Refpolicy]     [Linux SGX]     [Fedora Users]     [Fedora Desktop]     [Yosemite Photos]     [Yosemite Camping]     [Yosemite Campsites]     [KDE Users]     [Gnome Users]

  Powered by Linux