Race condition during policy load in kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

We've noticed a few instances of strange failures in userspace object managers happening immediately after a policy load, that we believe we've traced to a race condition in the kernel, and wanted to get your thoughts on our diagnosis and the appropriate fix before creating a patch in case we've missed something.

The issue is that userspace object managers rely on /sys/fs/selinux to determine the mapping of object class and permission strings to numeric identifiers, however immediately after a policy load, /sys/fs/selinux/{booleans,class,policy_capabilities}, are deleted and recreated.  This leaves a window where the object classes (etc) appear to be not defined, even if they are defined in both the old and new policies.

We have observed this with both dbus and systemd userspace object managers, and it can be reproduced straightforwardly by running the following (borrowed from bug linked below):

# (while true; do cat /sys/fs/selinux/class/service/index >/dev/null; done) &
# while true; do sudo load_policy; echo -n .; sleep 0.1;done

Periodically, we will get "No such file or directory" messages printed to stderr.  In the event of a userspace object manager using libselinux to check a userspace permission, that will result in a USER_AVC message indicating an unknown object class, and in the event that --handle-unknown is set to "deny", it will deny the access.

It seems to me as though some sort of locking should occur in the selinuxfs to prevent access to the files being deleted and recreated during the policy load, so that userspace programs relying on them (in particular userspace object managers doing class lookups) get a consistent view of the classes, perms, booleans and capabilities in the loaded policy.

This seems to be related to https://github.com/SELinuxProject/selinux-kernel/issues/42 but I believe it is a different case.  The deadlock in that bug seems to be related to the underlying filesystem functions, specifically around directory deletion while this is an issue the selinuxfs logic specifically. The above linked issue appears to have been fixed in recent upstream kernels, per the bug, but I have verified the issue I am discussing here in 5.7.0 rc3.

It seems to me as though from the perspective of userspace that all of sel_make_policy_nodes (or at least all of each of its component functions) should be atomic.  There was some discussion in a previous thread (https://lore.kernel.org/selinux/20181002155810.GP32577@xxxxxxxxxxxxxxxxxx/) around a significant refactor of policy loading in general.  It appears as though the direct issue there of access during the deletion has been resolved (commit d4f4de5e5ef8efde85febb6876cd3c8ab1631999), although the complete suggested fix of refactoring policy_load into two parts has not been done.  Would that refactor be the right approach to the problem I am trying to solve?  Would a patch for adding locking around the selinuxfs delete/recreate operation be considered? That wouldn't address all the concerns, (namely the potential to access a view of the policy that doesn't match the currently loaded policy and error recovery in the case that sel_make_nodes fails), but would improve the reliability of existing userspace object managers

I'm happy to create and submit a patch, but I wanted to get the communities thought on the problem and correct approach first.

-Daniel



[Index of Archives]     [Selinux Refpolicy]     [Linux SGX]     [Fedora Users]     [Fedora Desktop]     [Yosemite Photos]     [Yosemite Camping]     [Yosemite Campsites]     [KDE Users]     [Gnome Users]

  Powered by Linux