On Fri, Aug 2, 2024 at 2:34 PM Stephen Smalley <stephen.smalley.work@xxxxxxxxx> wrote: > > On Fri, Aug 2, 2024 at 1:18 PM Stephen Smalley > <stephen.smalley.work@xxxxxxxxx> wrote: > > > > On Thu, Aug 1, 2024 at 3:28 PM Stephen Smalley > > <stephen.smalley.work@xxxxxxxxx> wrote: > > > > > > On Thu, Aug 1, 2024 at 2:48 PM Stephen Smalley > > > <stephen.smalley.work@xxxxxxxxx> wrote: > > > > > > > > On Thu, Aug 1, 2024 at 10:03 AM Stephen Smalley > > > > <stephen.smalley.work@xxxxxxxxx> wrote: > > > > > > > > > > On Thu, Aug 1, 2024 at 9:01 AM Stephen Smalley > > > > > <stephen.smalley.work@xxxxxxxxx> wrote: > > > > > > > > > > > > On Thu, Aug 1, 2024 at 8:27 AM Stephen Smalley > > > > > > <stephen.smalley.work@xxxxxxxxx> wrote: > > > > > > > > > > > > > > Given the recent discussion of the SELinux namespaces patches, I > > > > > > > re-based the working-selinuxns branch of my selinux-kernel fork on top > > > > > > > of the dev branch. This required first reverting commit e67b79850fcc > > > > > > > ("selinux: stop passing selinux_state pointers and their offspring") > > > > > > > which I had created at Linus' request some time ago to avoid the > > > > > > > extraneous overhead associated with passing those pointers when there > > > > > > > could only be one selinux_state structure. Due to the number of > > > > > > > changes, both substantive and coding style related, since the last > > > > > > > re-base in 2020, there were numerous conflicts that required manual > > > > > > > resolution. I also checked the coding style of each patch with Paul's > > > > > > > scripts and fixed any issues introduced by the patches along the way. > > > > > > > > > > > > > > The rebase can be found at: > > > > > > > https://github.com/stephensmalley/selinux-kernel/tree/working-selinuxns > > > > > > > > > > > > > > It boots, passes the selinux-testsuite (including the NFS tests), and > > > > > > > passes the following > > > > > > > trivial exercising of the unshare mechanism: > > > > > > > $ sudo bash > > > > > > > # echo 1 > /sys/fs/selinux/unshare > > > > > > > # unshare -m -n > > > > > > > # umount /sys/fs/selinux > > > > > > > # mount -t selinuxfs none /sys/fs/selinux > > > > > > > # id > > > > > > > uid=0(root) gid=0(root) groups=0(root) context=kernel > > > > > > > # getenforce > > > > > > > Permissive > > > > > > > # load_policy > > > > > > > # id > > > > > > > uid=0(root) gid=0(root) groups=0(root) context=system_u:system_r:kernel_t:s0 > > > > > > > > > > > > > > All the same caveats apply - this is still not safe to use and has > > > > > > > many unresolved issues as noted in the patch descriptions. > > > > > > > > > > > > Also, note that as before, this branch does NOT include the original > > > > > > patches to support per-namespace superblock and inode security > > > > > > structures. Given the known problems with those patches and recent > > > > > > discussions, we likely don't want them anyway, but for reference they > > > > > > can still be found here: > > > > > > https://github.com/stephensmalley/selinux-kernel/commit/3378718ef7d4a837f32c63bdfcc0b70342cdd55d > > > > > > https://github.com/stephensmalley/selinux-kernel/commit/efb2ddadfdd0e10e75b6aa5da2ed9841df6ef2f6 > > > > > > > > > > > > Without those patches, ls -Z will show as unlabeled any files whose > > > > > > inodes were already in-core at the time of the creation of the new > > > > > > SELinux namespace because their inode security structures will have > > > > > > SIDs that do not exist in the new namespace's SID table. > > > > > > > > > > > > An alternative approach proposed by Huawei to my original patches can > > > > > > be found here: > > > > > > https://lore.kernel.org/selinux/20220216125206.20975-1-igor.baranov@xxxxxxxxxx/#r > > > > > > > > > > > > However, those patches also seem to have quite a few unresolved issues. > > > > > > > > > > Actually, re-reading that thread inspired me to take one of the ideas > > > > > suggested by Huawei, so I just pushed up one change on top of my > > > > > working-selinuxns branch to support saving the SELinux namespace in > > > > > the inode security blob and re-initializing it when an inode is > > > > > accessed by a process in a different SELinux namespace. This at least > > > > > allows ls -Z to correctly show the security contexts of files even > > > > > when they are already in-core. > > > > > There still remain many issues though to resolve... > > > > > > > > As just one example, if you setenforce 1 in the new SELinux namespace, > > > > you'll instantly lose access to your open files since the file > > > > security blobs (not the inode security blobs) still have SIDs that > > > > were created in the parent namespace and aren't defined in yours. And > > > > naively changing those to match your namespace will break access by > > > > the parent process. Probably need to create a helper program akin to > > > > newrole that handles closing and re-opening the fds in the new > > > > namespace. > > > > > > > > Then you still have the problem of the inode security blobs not being > > > > refreshed on certain code paths because you can't sleep on those code > > > > paths and fetching the xattr may sleep. Might require saving a copy of > > > > the security context string in the inode security blob so that you > > > > don't have to perform any blocking operations to refresh the SID to > > > > match the current namespace, but that will come at an obvious cost in > > > > memory. > > > > > > The last paragraph above actually rears its head even without fixing > > > the fd problem; > > > unfortunately that last patch I pushed on top of working-selinuxns > > > will render you unable to ssh into the system if/when you end up > > > creating a new SELinux namespace and accessing files needed by sshd. > > > It seems that the patch is updating the inode SID and namespace to the > > > new namespace and then sshd is trying to access those files through a > > > code path that cannot call inode_security_revalidate() with > > > may_sleep==true and hence ends up using the wrong SID and being denied > > > access. > > > > I just pushed up another change on top of my working-selinuxns branch > > that fixes this issue albeit inefficiently memory-wise. > > For the moment, it saves the security context in the inode security > > blob when inode_doinit_with_dentry() first obtains it via getxattr(), > > and later re-uses that context in the !may_sleep case. Ultimately we > > will want to re-do this using an approach similar to Smack so that we > > don't keep multiple copies of the same security context in memory. > > This change combined with adding further calls to > > __inode_security_revalidate() to the file*has_perm() prior to > > accessing the inode security blob (now that we can safely do so) > > eliminates the problem I was seeing with sshd, although I am sure > > there remain other cases. > > Next issue to address is the open fds when you unshare the namespace. > > Ok, I pushed up two more changes - one to fix a memory leak in the > previous change (thanks kmemleak!) and one to update the security > blobs of the open files of the current process when it unshares its > SELinux namespace. This gets us a bit farther; now when I setenforce 1 > in the new SELinux namespace, it retains access to its > stdin/stdout/stderr but it segfaults because it loses access to its > memory mappings. I see the Huawei patch set had something to try to > migrate the VMAs of the current process; that looks ugly and doesn't > have a convenient helper in the mm subsystem unlike the open files > situation (iterate_fd helper that was already being used by > flush_unauthorized_files). So that's the next challenge. Might be > better though to just defer the unsharing of the SELinux namespace to > the next execve, similar to setexeccon(), so that we don't have to > deal with it. FYI, discovered and fixed several reference counting bugs and memory leaks, squashed the fixes back into the original patches that introduced the bugs, and force-pushed the updated working-selinuxns branch.