Re: [PATCH v2 4/4] selinux: Create new booleans and class dirs out of tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/12/20 3:15 PM, Daniel Burgener wrote:

In order to avoid concurrency issues around selinuxfs resource availability
during policy load, we first create new directories out of tree for
reloaded resources, then swap them in, and finally delete the old versions.

This fix focuses on concurrency in each of the three subtrees swapped, and
not concurrency across the three trees.  This means that it is still possible
that subsequent reads to eg the booleans directory and the class directory
during a policy load could see the old state for one and the new for the other.
The problem of ensuring that policy loads are fully atomic from the perspective
of userspace is larger than what is dealt with here.  This commit focuses on
ensuring that the directories contents always match either the new or the old
policy state from the perspective of userspace.

In the previous implementation, on policy load /sys/fs/selinux is updated
by deleting the previous contents of
/sys/fs/selinux/{class,booleans} and then recreating them.  This means
that there is a period of time when the contents of these directories do not
exist which can cause race conditions as userspace relies on them for
information about the policy.  In addition, it means that error recovery in
the event of failure is challenging.

In order to demonstrate the race condition that this series fixes, you
can use the following commands:

while true; do cat /sys/fs/selinux/class/service/perms/status
/dev/null; done &
while true; do load_policy; done;

In the existing code, this will display errors fairly often as the class
lookup fails.  (In normal operation from systemd, this would result in a
permission check which would be allowed or denied based on policy settings
around unknown object classes.) After applying this patch series you
should expect to no longer see such error messages.

Signed-off-by: Daniel Burgener <dburgener@xxxxxxxxxxxxxxxxxxx>
---
  security/selinux/selinuxfs.c | 145 +++++++++++++++++++++++++++++------
  1 file changed, 120 insertions(+), 25 deletions(-)

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index f09afdb90ddd..d3a19170210a 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
+	tmp_policycap_dir = sel_make_dir(tmp_parent, POLICYCAP_DIR_NAME, &fsi->last_ino);
+	if (IS_ERR(tmp_policycap_dir)) {
+		ret = PTR_ERR(tmp_policycap_dir);
+		goto out;
+	}

No need to re-create this one.

-	return 0;
+	// booleans
+	old_dentry = fsi->bool_dir;
+	lock_rename(tmp_bool_dir, old_dentry);
+	ret = vfs_rename(tmp_parent->d_inode, tmp_bool_dir, fsi->sb->s_root->d_inode,
+			 fsi->bool_dir, NULL, RENAME_EXCHANGE);

One issue with using vfs_rename() is that it will trigger all of the permission checks associated with renaming, and previously this was never required for selinuxfs and therefore might not be allowed in some policies even to a process allowed to reload policy.  So if you need to do this, you may want to override creds around this call to use the init cred (which will still require allowing it to the kernel domain but not necessarily to the process that is performing the policy load).  The other issue is that you then have to implement a rename inode operation and thus technically it is possible for userspace to also attempt renames on selinuxfs to the extent allowed by policy.  I see that debugfs has a debugfs_rename() that internally uses simple_rename() but I guess that doesn't cover the RENAME_EXCHANGE case.

+	// Since the other temporary dirs are children of tmp_parent
+	// this will handle all the cleanup in the case of a failure before
+	// the swapover

Don't use // style comments please, especially not for multi-line comments.  I think they are only used in selinux for the script-generated license lines.

+static struct dentry *sel_make_disconnected_dir(struct super_block *sb,
+						unsigned long *ino)
+{
+	struct inode *inode = sel_make_inode(sb, S_IFDIR | S_IRUGO | S_IXUGO);
+
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+
+	inode->i_op = &sel_dir_inode_operations;
+	inode->i_fop = &simple_dir_operations;
+	inode->i_ino = ++(*ino);
+	/* directory inodes start off with i_nlink == 2 (for "." entry) */
+	inc_nlink(inode);
+	return d_obtain_alias(inode);
+}
+

Since you are always incrementing the last_ino counter and never reusing the ones for the removed inodes, you could technically eventually end up with one of these directories have the same inode number as one of the inodes whose inode numbers are generated in specific ranges (i.e. for initial_contexts, booleans, classes, and policy capabilities). Optimally we'd just reuse the inode number for the inode we are replacing?





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux