Re: SELinux namespaces re-base

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 9, 2024 at 3:25 PM Stephen Smalley
<stephen.smalley.work@xxxxxxxxx> wrote:
>
> On Wed, Oct 9, 2024 at 1:57 PM Stephen Smalley
> <stephen.smalley.work@xxxxxxxxx> wrote:
> >
> > On Wed, Oct 9, 2024 at 9:09 AM Stephen Smalley
> > <stephen.smalley.work@xxxxxxxxx> wrote:
> > >
> > > On Tue, Oct 8, 2024 at 9:32 AM Stephen Smalley
> > > <stephen.smalley.work@xxxxxxxxx> wrote:
> > > > Re-based again on top of latest selinux/dev to resolve the conflicts
> > > > with the just-merged patches and to update the new netlink xperm
> > > > support for SELinux namespaces. Passes the selinux-testsuite including
> > > > the (not yet merged) nlmsg tests in both the init SELinux namespace
> > > > and a child SELinux namespace (modulo the labeled IPSEC tests and with
> > > > the init SELinux namespace permissive for testing the child or
> > > > modifying the init namespace policy to permit it to run all the tests
> > > > in the child context). Functionally, this is nearly complete as far as
> > > > SELinux-only changes go (not including the corresponding work needed
> > > > to namespace audit and if desired/necessary, to allow namespacing of
> > > > the labeled IPSEC hooks), modulo any bugs that get discovered in
> > > > trying to create real containers with their own SELinux namespaces and
> > > > different combinations of policies between the host OS and the
> > > > containers.
> > > >
> > > > My remaining ToDo list is as follows, but this is a good point for
> > > > others to provide feedback or experiment with the functionality or
> > > > propose their favorite container runtime for the next stages of
> > > > prototyping. If it would help spark feedback, I could post the current
> > > > set of kernel patches to the list.
> > > >
> > > > - Test creation/use of SELinux namespaces from actual containers with
> > > > different policies from the host OS. This may require patching a
> > > > container runtime to add support for unsharing the SELinux namespace
> > > > and unmounting the old selinuxfs prior to starting the container init.
> > > > Combinations to test: No policy loaded on host, policy loaded in
> > > > container e.g. Fedora on Ubuntu; host with newer base policy than
> > > > container e.g. RHEL/Rocky 8/9 on Fedora; container with newer base
> > > > policy than host e.g. Fedora on RHEL/Rocky 8/9; host and container
> > > > with different upstream policy sources e.g. Ubuntu on Fedora; Android
> > > > container on Linux host OS.
> > >
> > > To help get this started, I created a patch for libselinux to provide
> > > a selinux_unshare() API that unshares the SELinux namespace (hiding
> > > the current messy internal details of the existing kernel interface
> > > and also dealing with various situations under which it might be
> > > called by container runtimes with selinuxfs already mounted, bind
> > > mounted read-only, or not mounted at all) along with a sample
> > > unshareselinux utility that shows how to use it, and a patch for
> > > systemd-nspawn to show how it might be called from a container runtime
> > > to unshare the SELinux namespace during container creation. These can
> > > be found the selinuxns branches of my selinux userspace and systemd
> > > forks at:
> > > https://github.com/stephensmalley/selinux/tree/selinuxns
> > > and
> > > https://github.com/stephensmalley/systemd/tree/selinuxns
> > > respectively.
> > >
> > > While the patches appear to work correctly (i.e. you end up with a new
> > > SELinux namespace, after which you can mount a new selinuxfs that is
> > > private to your namespace, load a policy, set enforcing mode, etc),
> > > unfortunately it appears that systemd doesn't just do the Right Thing
> > > automatically when it is invoked as a container init after unsharing
> > > the SELinux namespace, i.e. it does not proceed to call the SELinux
> > > setup functionality so it never tries to mount selinuxfs and load a
> > > policy within the container. Unsurprising but it does mean that
> > > someone will need to modify it to do so in this case while ensuring
> > > that this doesn't break existing setups without the SELinux namespace
> > > functionality.
> >
> > Pushed up a further commit to the branch on my fork of systemd to have
> > it call the SELinux setup + init functions if invoked from
> > systemd-nspawn with the SELinux namespace unshared. The existing
> > systemd was skipping setup/init of all of the MAC modules if running
> > in a container, which was understandable absent namespace support. My
> > current patch (just to allow further progress) only relaxes that
> > constraint for SELinux and only if launched via systemd-nspawn with
> > the --selinux-namespace option; this would of course be generalized
> > further if/when we get around to upstreaming it. With that change and
> > installing the modified systemd into the container root filesystem, I
> > can start a container via systemd-nspawn with the --selinux-namespace
> > option and have it unshare the SELinux namespace, load policy from the
> > container's root, and set its enforcing mode. At present, if the
> > container is configured to be enforcing, the container will fail due
> > to denials in the child SELinux namespace arising from the following:
> > - systemd creates a regular tmpfs mount for the container /dev, so at
> > least some of the /dev nodes are not correctly labeled at startup.
> > This can likely be fixed through some combination of policy and
> > perhaps performing a restorecon("/dev") after first loading policy.
> > - Certain /proc/sys files in the container are labeled with
> > "unlabeled_t" for some reason, likely due to being accessed n the
> > namespace before it loads a policy and not getting initialized
> > afterward. Similarly could be fixed via a restorecon("/proc") after
> > policy load if we can't solve it kernel-side.
>
> Sorry, obviously can't do a restorecon of /proc so that's not an option.
> I suspect that the existing selinux_complete_init() walk of
> uninitialized superblocks and their inodes after first policy load
> isn't getting done properly for child SELinux namespaces; will have to
> look into that on the kernel side.

Yes, that was the issue. Fixed with another commit pushed up to the
working-selinuxns branch of my selinux-kernel fork. So the /proc
labeling is fixed within the container. Still have the other denials
to address but those might all be userspace or policy fixes.

>
> > - sendto permission denied from kernel_t and from init_t to
> > unconfined_t:unix_dgram_socket; this is likely the container sending
> > to a socket in the parent namespace.
> >
> > There are no doubt more beyond these. However, in permissive (with the
> > parent/init namespace still enforcing), the container did come up
> > fully and sees SELinux as enabled.





[Index of Archives]     [Selinux Refpolicy]     [Linux SGX]     [Fedora Users]     [Fedora Desktop]     [Yosemite Photos]     [Yosemite Camping]     [Yosemite Campsites]     [KDE Users]     [Gnome Users]

  Powered by Linux