Re: [RFC PATCH v2] selinux: convert policy read-write lock to RCU

Stephen Smalley <stephen.smalley.work@xxxxxxxxx> · Wed, 19 Aug 2020 08:18:48 -0400

On 8/19/20 4:40 AM, Ondrej Mosnacek wrote:

A few comments inline after a quick glance...

On Wed, Aug 19, 2020 at 12:17 AM Stephen Smalley
<stephen.smalley.work@xxxxxxxxx> wrote:
Convert the policy read-write lock to RCU.  This is significantly
simplified by the earlier work to encapsulate the policy data
structures and refactor the policy load and boolean setting logic.
Move the latest_granting sequence number into the selinux_policy
structure so that it can be updated atomically with the policy.
Since the removal of the policy rwlock and this move of
latest_granting reduces the selinux_ss structure to nothing more than
a wrapper around the selinux_policy pointer, get rid of the extra
layer of indirection.

At present this change merely passes a hardcoded 1 to
rcu_dereference_check() in the cases where we know we do not need to
take rcu_read_lock(), with the preceding comment explaining why.
Alternatively we could pass fsi->mutex down from selinuxfs and
apply a lockdep check on it instead.

This change does not specifically do anything special with respect
to the sidtab live convert; I am unclear as to whether it is safe
as a result.  Comments welcome.

Based in part on earlier attempts to convert the policy rwlock
to RCU by Kaigai Kohei [1] and by Peter Enderborg [2].

[1] https://lore.kernel.org/selinux/6e2f9128-e191-ebb3-0e87-74bfccb0767f@xxxxxxxxxxxxx/
[2] https://lore.kernel.org/selinux/20180530141104.28569-1-peter.enderborg@xxxxxxxx/

Signed-off-by: Stephen Smalley <stephen.smalley.work@xxxxxxxxx>
---

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index f6f78c65f53f..ba9347517e5b 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
[...]

@@ -722,8 +716,9 @@ static int security_validtrans_handle_fail(struct selinux_state *state,
                                            struct sidtab_entry *tentry,
                                            u16 tclass)
  {
-       struct policydb *p = &state->ss->policy->policydb;
-       struct sidtab *sidtab = state->ss->policy->sidtab;
+       struct selinux_policy *policy = rcu_dereference(state->policy);
This looks dangerous - you might get a newer policy pointer here than
the one you dereferenced in security_compute_validatetrans(), which
wouldn't be good. Instead of double-dereferencing you should always
pass the struct selinux_policy pointer as an argument to functions
like this. Please double-check if there are other cases like this, I
didn't go over the patch in detail yet.
Is that true even though the entire function is called with 
rcu_read_lock() held? In any event, I can fix this for clarity.
@@ -2174,11 +2206,16 @@ void selinux_policy_commit(struct 
selinux_state *state,
                         pr_info("SELinux: Enabling MLS support...\n");
         }

+
+       /* Set latest granting seqno for new policy. */
+       if (oldpolicy)
+               newpolicy->latest_granting = oldpolicy->latest_granting + 1;
+       else
+               newpolicy->latest_granting = 1;
+       seqno = newpolicy->latest_granting;
+
         /* Install the new policy. */
-       write_lock_irq(&state->ss->policy_rwlock);
-       state->ss->policy = newpolicy;
-       seqno = ++state->ss->latest_granting;
-       write_unlock_irq(&state->ss->policy_rwlock);
+       rcu_assign_pointer(state->policy, newpolicy);
This is probably a pre-existing thing, but I noticed that there is a
small window of inconsistency between policy (re)load / bool change
and AV cache. Between the rcu_assign_pointer() (or the write lock
section before) and avc_ss_reset(), there might be AVC lookups, which
in case of cache hit would decide based on the old policy, but in case
of a miss, they would decide based on the new one. I think it might be
fixed by moving avc_ss_reset() before rcu_assign_pointer() (but no
earlier than when it is guaranteed that the policy reload will
complete successfully).
We can't move the avc_ss_reset() before we have installed the new policy 
or else the AVC will update to the new seqno but possibly still get avd 
entries with the old seqno on security_compute_av() calls.  It is not 
unexpected that the AVC will keep seeing old decisions for a brief 
window; it is just as if they happened before the policy change.  Since 
we aren't providing a higher level synchronization in hooks.c to ensure 
that all permission checks made in a single hook (or at an even higher 
level, in a single system call) are done against the same policy, I 
don't think we gain anything by changing this. In any event, as you say, 
this is pre-existing and thus I don't want to change it as part of this 
patch.
Anyway, since we need to flush the AVC at each policy reload, it might
be more logical now to move the AVC under struct selinux_policy as
well, and avoid the seqno logic altogether. After the RCU conversion,
accessing a consistent struct selinux_policy becomes really cheap, so
integrating the AVC should allow for code simplification for (almost)
zero performance impact. A similar thing could be probably done also
with the netlabel/xfrm cache (not really familiar with those). Also
the duplicit policycap array in struct selinux_state could be replaced
with queries to struct selinux_policy instead (but I have a hunch
there are some pre-existing logical race conditions around the
handling of policycaps...).

This might be possible in the kernel but the design was to permit the 
security server and AVC to potentially live in different components / 
address spaces, with the original implementation in a microkernel where 
the security server was a userspace task.  Even now, the security server 
is providing decisions to userspace policy enforcers with their own AVC 
(from libselinux) and therefore still needs a way to synchronize policy 
changes (it sends policyload notifications via netlink with the seqno, 
and /sys/fs/selinux/access provides the seqno for the decision). Some 
improvements are certainly possible here but again I'd make those separate.