On Thu, Aug 1, 2024 at 3:28 PM Stephen Smalley <stephen.smalley.work@xxxxxxxxx> wrote: > > On Thu, Aug 1, 2024 at 2:48 PM Stephen Smalley > <stephen.smalley.work@xxxxxxxxx> wrote: > > > > On Thu, Aug 1, 2024 at 10:03 AM Stephen Smalley > > <stephen.smalley.work@xxxxxxxxx> wrote: > > > > > > On Thu, Aug 1, 2024 at 9:01 AM Stephen Smalley > > > <stephen.smalley.work@xxxxxxxxx> wrote: > > > > > > > > On Thu, Aug 1, 2024 at 8:27 AM Stephen Smalley > > > > <stephen.smalley.work@xxxxxxxxx> wrote: > > > > > > > > > > Given the recent discussion of the SELinux namespaces patches, I > > > > > re-based the working-selinuxns branch of my selinux-kernel fork on top > > > > > of the dev branch. This required first reverting commit e67b79850fcc > > > > > ("selinux: stop passing selinux_state pointers and their offspring") > > > > > which I had created at Linus' request some time ago to avoid the > > > > > extraneous overhead associated with passing those pointers when there > > > > > could only be one selinux_state structure. Due to the number of > > > > > changes, both substantive and coding style related, since the last > > > > > re-base in 2020, there were numerous conflicts that required manual > > > > > resolution. I also checked the coding style of each patch with Paul's > > > > > scripts and fixed any issues introduced by the patches along the way. > > > > > > > > > > The rebase can be found at: > > > > > https://github.com/stephensmalley/selinux-kernel/tree/working-selinuxns > > > > > > > > > > It boots, passes the selinux-testsuite (including the NFS tests), and > > > > > passes the following > > > > > trivial exercising of the unshare mechanism: > > > > > $ sudo bash > > > > > # echo 1 > /sys/fs/selinux/unshare > > > > > # unshare -m -n > > > > > # umount /sys/fs/selinux > > > > > # mount -t selinuxfs none /sys/fs/selinux > > > > > # id > > > > > uid=0(root) gid=0(root) groups=0(root) context=kernel > > > > > # getenforce > > > > > Permissive > > > > > # load_policy > > > > > # id > > > > > uid=0(root) gid=0(root) groups=0(root) context=system_u:system_r:kernel_t:s0 > > > > > > > > > > All the same caveats apply - this is still not safe to use and has > > > > > many unresolved issues as noted in the patch descriptions. > > > > > > > > Also, note that as before, this branch does NOT include the original > > > > patches to support per-namespace superblock and inode security > > > > structures. Given the known problems with those patches and recent > > > > discussions, we likely don't want them anyway, but for reference they > > > > can still be found here: > > > > https://github.com/stephensmalley/selinux-kernel/commit/3378718ef7d4a837f32c63bdfcc0b70342cdd55d > > > > https://github.com/stephensmalley/selinux-kernel/commit/efb2ddadfdd0e10e75b6aa5da2ed9841df6ef2f6 > > > > > > > > Without those patches, ls -Z will show as unlabeled any files whose > > > > inodes were already in-core at the time of the creation of the new > > > > SELinux namespace because their inode security structures will have > > > > SIDs that do not exist in the new namespace's SID table. > > > > > > > > An alternative approach proposed by Huawei to my original patches can > > > > be found here: > > > > https://lore.kernel.org/selinux/20220216125206.20975-1-igor.baranov@xxxxxxxxxx/#r > > > > > > > > However, those patches also seem to have quite a few unresolved issues. > > > > > > Actually, re-reading that thread inspired me to take one of the ideas > > > suggested by Huawei, so I just pushed up one change on top of my > > > working-selinuxns branch to support saving the SELinux namespace in > > > the inode security blob and re-initializing it when an inode is > > > accessed by a process in a different SELinux namespace. This at least > > > allows ls -Z to correctly show the security contexts of files even > > > when they are already in-core. > > > There still remain many issues though to resolve... > > > > As just one example, if you setenforce 1 in the new SELinux namespace, > > you'll instantly lose access to your open files since the file > > security blobs (not the inode security blobs) still have SIDs that > > were created in the parent namespace and aren't defined in yours. And > > naively changing those to match your namespace will break access by > > the parent process. Probably need to create a helper program akin to > > newrole that handles closing and re-opening the fds in the new > > namespace. > > > > Then you still have the problem of the inode security blobs not being > > refreshed on certain code paths because you can't sleep on those code > > paths and fetching the xattr may sleep. Might require saving a copy of > > the security context string in the inode security blob so that you > > don't have to perform any blocking operations to refresh the SID to > > match the current namespace, but that will come at an obvious cost in > > memory. > > The last paragraph above actually rears its head even without fixing > the fd problem; > unfortunately that last patch I pushed on top of working-selinuxns > will render you unable to ssh into the system if/when you end up > creating a new SELinux namespace and accessing files needed by sshd. > It seems that the patch is updating the inode SID and namespace to the > new namespace and then sshd is trying to access those files through a > code path that cannot call inode_security_revalidate() with > may_sleep==true and hence ends up using the wrong SID and being denied > access. I just pushed up another change on top of my working-selinuxns branch that fixes this issue albeit inefficiently memory-wise. For the moment, it saves the security context in the inode security blob when inode_doinit_with_dentry() first obtains it via getxattr(), and later re-uses that context in the !may_sleep case. Ultimately we will want to re-do this using an approach similar to Smack so that we don't keep multiple copies of the same security context in memory. This change combined with adding further calls to __inode_security_revalidate() to the file*has_perm() prior to accessing the inode security blob (now that we can safely do so) eliminates the problem I was seeing with sshd, although I am sure there remain other cases. Next issue to address is the open fds when you unshare the namespace. > Smack's equivalent to the SID table allowed one to safely pass around > pointers to the label string, which was never de-allocated, so it only > needed a single copy in memory of each label. We could add something > similar outside the security server in SELinux so that > inode_security_revalidate() could re-map the context string to an > appropriate SID without requiring sleeping.