On Wed, Aug 19, 2020 at 2:18 PM Stephen Smalley <stephen.smalley.work@xxxxxxxxx> wrote: > On 8/19/20 4:40 AM, Ondrej Mosnacek wrote: > > > A few comments inline after a quick glance... > > > > On Wed, Aug 19, 2020 at 12:17 AM Stephen Smalley > > <stephen.smalley.work@xxxxxxxxx> wrote: > >> Convert the policy read-write lock to RCU. This is significantly > >> simplified by the earlier work to encapsulate the policy data > >> structures and refactor the policy load and boolean setting logic. > >> Move the latest_granting sequence number into the selinux_policy > >> structure so that it can be updated atomically with the policy. > >> Since the removal of the policy rwlock and this move of > >> latest_granting reduces the selinux_ss structure to nothing more than > >> a wrapper around the selinux_policy pointer, get rid of the extra > >> layer of indirection. > >> > >> At present this change merely passes a hardcoded 1 to > >> rcu_dereference_check() in the cases where we know we do not need to > >> take rcu_read_lock(), with the preceding comment explaining why. > >> Alternatively we could pass fsi->mutex down from selinuxfs and > >> apply a lockdep check on it instead. > >> > >> This change does not specifically do anything special with respect > >> to the sidtab live convert; I am unclear as to whether it is safe > >> as a result. Comments welcome. > >> > >> Based in part on earlier attempts to convert the policy rwlock > >> to RCU by Kaigai Kohei [1] and by Peter Enderborg [2]. > >> > >> [1] https://lore.kernel.org/selinux/6e2f9128-e191-ebb3-0e87-74bfccb0767f@xxxxxxxxxxxxx/ > >> [2] https://lore.kernel.org/selinux/20180530141104.28569-1-peter.enderborg@xxxxxxxx/ > >> > >> Signed-off-by: Stephen Smalley <stephen.smalley.work@xxxxxxxxx> > >> --- > >> diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c > >> index f6f78c65f53f..ba9347517e5b 100644 > >> --- a/security/selinux/ss/services.c > >> +++ b/security/selinux/ss/services.c > > [...] > > > > @@ -722,8 +716,9 @@ static int security_validtrans_handle_fail(struct selinux_state *state, > > struct sidtab_entry *tentry, > > u16 tclass) > > { > > - struct policydb *p = &state->ss->policy->policydb; > > - struct sidtab *sidtab = state->ss->policy->sidtab; > > + struct selinux_policy *policy = rcu_dereference(state->policy); > > This looks dangerous - you might get a newer policy pointer here than > > the one you dereferenced in security_compute_validatetrans(), which > > wouldn't be good. Instead of double-dereferencing you should always > > pass the struct selinux_policy pointer as an argument to functions > > like this. Please double-check if there are other cases like this, I > > didn't go over the patch in detail yet. > Is that true even though the entire function is called with > rcu_read_lock() held? In any event, I can fix this for clarity. AFAIK, holding an RCU read lock does not guarantee that the pointer doesn't change during the critical section (when you unwrap rcu_asign_pointer() and rcu_dereference(), they are really just a plain atomic read with some memory barriers, there is no extra magic there), it only guarantees that any object that you rcu_dereference will remain valid (i.e. it is not free'd/destroyed) for the duration of the critical section. > > @@ -2174,11 +2206,16 @@ void selinux_policy_commit(struct > > selinux_state *state, > >> pr_info("SELinux: Enabling MLS support...\n"); > >> } > >> > >> + > >> + /* Set latest granting seqno for new policy. */ > >> + if (oldpolicy) > >> + newpolicy->latest_granting = oldpolicy->latest_granting + 1; > >> + else > >> + newpolicy->latest_granting = 1; > >> + seqno = newpolicy->latest_granting; > >> + > >> /* Install the new policy. */ > >> - write_lock_irq(&state->ss->policy_rwlock); > >> - state->ss->policy = newpolicy; > >> - seqno = ++state->ss->latest_granting; > >> - write_unlock_irq(&state->ss->policy_rwlock); > >> + rcu_assign_pointer(state->policy, newpolicy); > > This is probably a pre-existing thing, but I noticed that there is a > > small window of inconsistency between policy (re)load / bool change > > and AV cache. Between the rcu_assign_pointer() (or the write lock > > section before) and avc_ss_reset(), there might be AVC lookups, which > > in case of cache hit would decide based on the old policy, but in case > > of a miss, they would decide based on the new one. I think it might be > > fixed by moving avc_ss_reset() before rcu_assign_pointer() (but no > > earlier than when it is guaranteed that the policy reload will > > complete successfully). > We can't move the avc_ss_reset() before we have installed the new policy > or else the AVC will update to the new seqno but possibly still get avd > entries with the old seqno on security_compute_av() calls. If I understand the code correctly, then avc_latest_notif_update() should reject inserts with old seqno, so it should become "frozen" so from that point it should only receive the new access vectors. Probably avc_flush() and avc_latest_notif_update() would also need to be reordered to make sure no old entries are added in between, but then it should be good. > It is not > unexpected that the AVC will keep seeing old decisions for a brief > window; it is just as if they happened before the policy change. Since > we aren't providing a higher level synchronization in hooks.c to ensure > that all permission checks made in a single hook (or at an even higher > level, in a single system call) are done against the same policy, I > don't think we gain anything by changing this. In any event, as you say, > this is pre-existing and thus I don't want to change it as part of this > patch. It's definitely not unexpected to see (sequentially on one task) decisions like: ..., old, old, old, new, new, ... but currently you could also get a pattern like: ..., old, old, new, old, new, new, ... which I wouldn't say is expected. It is unlikely to be a big deal in practice, but it just feels wrong... Anyway, I concur that this is somewhat off-topic for this patch... I raised it because I initially thought that it wasn't possible before RCU conversion, only after, but then I realized it was possible before as well. > > Anyway, since we need to flush the AVC at each policy reload, it might > > be more logical now to move the AVC under struct selinux_policy as > > well, and avoid the seqno logic altogether. After the RCU conversion, > > accessing a consistent struct selinux_policy becomes really cheap, so > > integrating the AVC should allow for code simplification for (almost) > > zero performance impact. A similar thing could be probably done also > > with the netlabel/xfrm cache (not really familiar with those). Also > > the duplicit policycap array in struct selinux_state could be replaced > > with queries to struct selinux_policy instead (but I have a hunch > > there are some pre-existing logical race conditions around the > > handling of policycaps...). > > This might be possible in the kernel but the design was to permit the > security server and AVC to potentially live in different components / > address spaces, with the original implementation in a microkernel where > the security server was a userspace task. Even now, the security server > is providing decisions to userspace policy enforcers with their own AVC > (from libselinux) and therefore still needs a way to synchronize policy > changes (it sends policyload notifications via netlink with the seqno, > and /sys/fs/selinux/access provides the seqno for the decision). Some > improvements are certainly possible here but again I'd make those separate. Okay, I agree that this is out of scope, too. This loosely synchronized system of kernel policy data + kernel AVC + userspace AVC still feels weird to me, but I don't have a better solution at this point, so I'll just have to live with it... -- Ondrej Mosnacek Software Engineer, Platform Security - SELinux kernel Red Hat, Inc.