On Tue, Aug 18, 2020 at 9:49 AM Daniel Burgener <dburgener@xxxxxxxxxxxxxxxxxxx> wrote: > > On 8/13/20 12:25 PM, Stephen Smalley wrote: > > On 8/12/20 3:15 PM, Daniel Burgener wrote: > > > >> In order to avoid concurrency issues around selinuxfs resource > >> availability > >> during policy load, we first create new directories out of tree for > >> reloaded resources, then swap them in, and finally delete the old > >> versions. > >> > >> This fix focuses on concurrency in each of the three subtrees > >> swapped, and > >> not concurrency across the three trees. This means that it is still > >> possible > >> that subsequent reads to eg the booleans directory and the class > >> directory > >> during a policy load could see the old state for one and the new for > >> the other. > >> The problem of ensuring that policy loads are fully atomic from the > >> perspective > >> of userspace is larger than what is dealt with here. This commit > >> focuses on > >> ensuring that the directories contents always match either the new or > >> the old > >> policy state from the perspective of userspace. > >> > >> In the previous implementation, on policy load /sys/fs/selinux is > >> updated > >> by deleting the previous contents of > >> /sys/fs/selinux/{class,booleans} and then recreating them. This means > >> that there is a period of time when the contents of these directories > >> do not > >> exist which can cause race conditions as userspace relies on them for > >> information about the policy. In addition, it means that error > >> recovery in > >> the event of failure is challenging. > >> > >> In order to demonstrate the race condition that this series fixes, you > >> can use the following commands: > >> > >> while true; do cat /sys/fs/selinux/class/service/perms/status > >>> /dev/null; done & > >> while true; do load_policy; done; > >> > >> In the existing code, this will display errors fairly often as the class > >> lookup fails. (In normal operation from systemd, this would result in a > >> permission check which would be allowed or denied based on policy > >> settings > >> around unknown object classes.) After applying this patch series you > >> should expect to no longer see such error messages. > >> > >> Signed-off-by: Daniel Burgener <dburgener@xxxxxxxxxxxxxxxxxxx> > >> --- > >> security/selinux/selinuxfs.c | 145 +++++++++++++++++++++++++++++------ > >> 1 file changed, 120 insertions(+), 25 deletions(-) > >> > >> diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c > >> index f09afdb90ddd..d3a19170210a 100644 > >> --- a/security/selinux/selinuxfs.c > >> +++ b/security/selinux/selinuxfs.c > >> + tmp_policycap_dir = sel_make_dir(tmp_parent, POLICYCAP_DIR_NAME, > >> &fsi->last_ino); > >> + if (IS_ERR(tmp_policycap_dir)) { > >> + ret = PTR_ERR(tmp_policycap_dir); > >> + goto out; > >> + } > > > > No need to re-create this one. > > > >> - return 0; > >> + // booleans > >> + old_dentry = fsi->bool_dir; > >> + lock_rename(tmp_bool_dir, old_dentry); > >> + ret = vfs_rename(tmp_parent->d_inode, tmp_bool_dir, > >> fsi->sb->s_root->d_inode, > >> + fsi->bool_dir, NULL, RENAME_EXCHANGE); > > > > One issue with using vfs_rename() is that it will trigger all of the > > permission checks associated with renaming, and previously this was > > never required for selinuxfs and therefore might not be allowed in > > some policies even to a process allowed to reload policy. So if you > > need to do this, you may want to override creds around this call to > > use the init cred (which will still require allowing it to the kernel > > domain but not necessarily to the process that is performing the > > policy load). The other issue is that you then have to implement a > > rename inode operation and thus technically it is possible for > > userspace to also attempt renames on selinuxfs to the extent allowed > > by policy. I see that debugfs has a debugfs_rename() that internally > > uses simple_rename() but I guess that doesn't cover the > > RENAME_EXCHANGE case. > > Those are good points. Do you see any problems with just calling > d_exchange() directly? It seems to work fine in very limited initial > testing on my end. That should hopefully address all the problems you > mentioned here. I was hoping the vfs folks would chime in but you may have to pose a more direct question to viro and linux-fsdevel to get a response. Possibly there should be a lower-level vfs helper that could be used internally by vfs_rename() and by things like debugfs_rename and a potential selinuxfs_rename.