Re: SELinux namespaces re-base

Stephen Smalley <stephen.smalley.work@xxxxxxxxx> · Fri, 2 Aug 2024 13:18:13 -0400

On Thu, Aug 1, 2024 at 3:28 PM Stephen Smalley
<stephen.smalley.work@xxxxxxxxx> wrote:
>
> On Thu, Aug 1, 2024 at 2:48 PM Stephen Smalley
> <stephen.smalley.work@xxxxxxxxx> wrote:
> >
> > On Thu, Aug 1, 2024 at 10:03 AM Stephen Smalley
> > <stephen.smalley.work@xxxxxxxxx> wrote:
> > >
> > > On Thu, Aug 1, 2024 at 9:01 AM Stephen Smalley
> > > <stephen.smalley.work@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Aug 1, 2024 at 8:27 AM Stephen Smalley
> > > > <stephen.smalley.work@xxxxxxxxx> wrote:
> > > > >
> > > > > Given the recent discussion of the SELinux namespaces patches, I
> > > > > re-based the working-selinuxns branch of my selinux-kernel fork on top
> > > > > of the dev branch. This required first reverting commit e67b79850fcc
> > > > > ("selinux: stop passing selinux_state pointers and their offspring")
> > > > > which I had created at Linus' request some time ago to avoid the
> > > > > extraneous overhead associated with passing those pointers when there
> > > > > could only be one selinux_state structure. Due to the number of
> > > > > changes, both substantive and coding style related, since the last
> > > > > re-base in 2020, there were numerous conflicts that required manual
> > > > > resolution. I also checked the coding style of each patch with Paul's
> > > > > scripts and fixed any issues introduced by the patches along the way.
> > > > >
> > > > > The rebase can be found at:
> > > > > https://github.com/stephensmalley/selinux-kernel/tree/working-selinuxns
> > > > >
> > > > > It boots, passes the selinux-testsuite (including the NFS tests), and
> > > > > passes the following
> > > > > trivial exercising of the unshare mechanism:
> > > > > $ sudo bash
> > > > > # echo 1 > /sys/fs/selinux/unshare
> > > > > # unshare -m -n
> > > > > # umount /sys/fs/selinux
> > > > > # mount -t selinuxfs none /sys/fs/selinux
> > > > > # id
> > > > > uid=0(root) gid=0(root) groups=0(root) context=kernel
> > > > > # getenforce
> > > > > Permissive
> > > > > # load_policy
> > > > > # id
> > > > > uid=0(root) gid=0(root) groups=0(root) context=system_u:system_r:kernel_t:s0
> > > > >
> > > > > All the same caveats apply - this is still not safe to use and has
> > > > > many unresolved issues as noted in the patch descriptions.
> > > >
> > > > Also, note that as before, this branch does NOT include the original
> > > > patches to support per-namespace superblock and inode security
> > > > structures. Given the known problems with those patches and recent
> > > > discussions, we likely don't want them anyway, but for reference they
> > > > can still be found here:
> > > > https://github.com/stephensmalley/selinux-kernel/commit/3378718ef7d4a837f32c63bdfcc0b70342cdd55d
> > > > https://github.com/stephensmalley/selinux-kernel/commit/efb2ddadfdd0e10e75b6aa5da2ed9841df6ef2f6
> > > >
> > > > Without those patches, ls -Z will show as unlabeled any files whose
> > > > inodes were already in-core at the time of the creation of the new
> > > > SELinux namespace because their inode security structures will have
> > > > SIDs that do not exist in the new namespace's SID table.
> > > >
> > > > An alternative approach proposed by Huawei to my original patches can
> > > > be found here:
> > > > https://lore.kernel.org/selinux/20220216125206.20975-1-igor.baranov@xxxxxxxxxx/#r
> > > >
> > > > However, those patches also seem to have quite a few unresolved issues.
> > >
> > > Actually, re-reading that thread inspired me to take one of the ideas
> > > suggested by Huawei, so I just pushed up one change on top of my
> > > working-selinuxns branch to support saving the SELinux namespace in
> > > the inode security blob and re-initializing it when an inode is
> > > accessed by a process in a different SELinux namespace. This at least
> > > allows ls -Z to correctly show the security contexts of files even
> > > when they are already in-core.
> > > There still remain many issues though to resolve...
> >
> > As just one example, if you setenforce 1 in the new SELinux namespace,
> > you'll instantly lose access to your open files since the file
> > security blobs (not the inode security blobs) still have SIDs that
> > were created in the parent namespace and aren't defined in yours. And
> > naively changing those to match your namespace will break access by
> > the parent process. Probably need to create a helper program akin to
> > newrole that handles closing and re-opening the fds in the new
> > namespace.
> >
> > Then you still have the problem of the inode security blobs not being
> > refreshed on certain code paths because you can't sleep on those code
> > paths and fetching the xattr may sleep. Might require saving a copy of
> > the security context string in the inode security blob so that you
> > don't have to perform any blocking operations to refresh the SID to
> > match the current namespace, but that will come at an obvious cost in
> > memory.
>
> The last paragraph above actually rears its head even without fixing
> the fd problem;
> unfortunately that last patch I pushed on top of working-selinuxns
> will render you unable to ssh into the system if/when you end up
> creating a new SELinux namespace and accessing files needed by sshd.
> It seems that the patch is updating the inode SID and namespace to the
> new namespace and then sshd is trying to access those files through a
> code path that cannot call  inode_security_revalidate() with
> may_sleep==true and hence ends up using the wrong SID and being denied
> access.

I just pushed up another change on top of my working-selinuxns branch
that fixes this issue albeit inefficiently memory-wise.
For the moment, it saves the security context in the inode security
blob when inode_doinit_with_dentry() first obtains it via getxattr(),
and later re-uses that context in the !may_sleep case. Ultimately we
will want to re-do this using an approach similar to Smack so that we
don't keep multiple copies of the same security context in memory.
This change combined with adding further calls to
__inode_security_revalidate() to the file*has_perm() prior to
accessing the inode security blob (now that we can safely do so)
eliminates the problem I was seeing with sshd, although I am sure
there remain other cases.
Next issue to address is the open fds when you unshare the namespace.

> Smack's equivalent to the SID table allowed one to safely pass around
> pointers to the label string, which was never de-allocated, so it only
> needed a single copy in memory of each label. We could add something
> similar outside the security server in SELinux so that
> inode_security_revalidate() could re-map the context string to an
> appropriate SID without requiring sleeping.