Re: [RFC PATCH v2] selinux: convert policy read-write lock to RCU

Ondrej Mosnacek <omosnace@xxxxxxxxxx> · Wed, 19 Aug 2020 15:24:09 +0200

On Wed, Aug 19, 2020 at 2:18 PM Stephen Smalley
<stephen.smalley.work@xxxxxxxxx> wrote:
> On 8/19/20 4:40 AM, Ondrej Mosnacek wrote:
>
> > A few comments inline after a quick glance...
> >
> > On Wed, Aug 19, 2020 at 12:17 AM Stephen Smalley
> > <stephen.smalley.work@xxxxxxxxx> wrote:
> >> Convert the policy read-write lock to RCU.  This is significantly
> >> simplified by the earlier work to encapsulate the policy data
> >> structures and refactor the policy load and boolean setting logic.
> >> Move the latest_granting sequence number into the selinux_policy
> >> structure so that it can be updated atomically with the policy.
> >> Since the removal of the policy rwlock and this move of
> >> latest_granting reduces the selinux_ss structure to nothing more than
> >> a wrapper around the selinux_policy pointer, get rid of the extra
> >> layer of indirection.
> >>
> >> At present this change merely passes a hardcoded 1 to
> >> rcu_dereference_check() in the cases where we know we do not need to
> >> take rcu_read_lock(), with the preceding comment explaining why.
> >> Alternatively we could pass fsi->mutex down from selinuxfs and
> >> apply a lockdep check on it instead.
> >>
> >> This change does not specifically do anything special with respect
> >> to the sidtab live convert; I am unclear as to whether it is safe
> >> as a result.  Comments welcome.
> >>
> >> Based in part on earlier attempts to convert the policy rwlock
> >> to RCU by Kaigai Kohei [1] and by Peter Enderborg [2].
> >>
> >> [1] https://lore.kernel.org/selinux/6e2f9128-e191-ebb3-0e87-74bfccb0767f@xxxxxxxxxxxxx/
> >> [2] https://lore.kernel.org/selinux/20180530141104.28569-1-peter.enderborg@xxxxxxxx/
> >>
> >> Signed-off-by: Stephen Smalley <stephen.smalley.work@xxxxxxxxx>
> >> ---
> >> diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
> >> index f6f78c65f53f..ba9347517e5b 100644
> >> --- a/security/selinux/ss/services.c
> >> +++ b/security/selinux/ss/services.c
> > [...]
> >
> > @@ -722,8 +716,9 @@ static int security_validtrans_handle_fail(struct selinux_state *state,
> >                                             struct sidtab_entry *tentry,
> >                                             u16 tclass)
> >   {
> > -       struct policydb *p = &state->ss->policy->policydb;
> > -       struct sidtab *sidtab = state->ss->policy->sidtab;
> > +       struct selinux_policy *policy = rcu_dereference(state->policy);
> > This looks dangerous - you might get a newer policy pointer here than
> > the one you dereferenced in security_compute_validatetrans(), which
> > wouldn't be good. Instead of double-dereferencing you should always
> > pass the struct selinux_policy pointer as an argument to functions
> > like this. Please double-check if there are other cases like this, I
> > didn't go over the patch in detail yet.
> Is that true even though the entire function is called with
> rcu_read_lock() held? In any event, I can fix this for clarity.

AFAIK, holding an RCU read lock does not guarantee that the pointer
doesn't change during the critical section (when you unwrap
rcu_asign_pointer() and rcu_dereference(), they are really just a
plain atomic read with some memory barriers, there is no extra magic
there), it only guarantees that any object that you rcu_dereference
will remain valid (i.e. it is not free'd/destroyed) for the duration
of the critical section.

> > @@ -2174,11 +2206,16 @@ void selinux_policy_commit(struct
> > selinux_state *state,
> >>                          pr_info("SELinux: Enabling MLS support...\n");
> >>          }
> >>
> >> +
> >> +       /* Set latest granting seqno for new policy. */
> >> +       if (oldpolicy)
> >> +               newpolicy->latest_granting = oldpolicy->latest_granting + 1;
> >> +       else
> >> +               newpolicy->latest_granting = 1;
> >> +       seqno = newpolicy->latest_granting;
> >> +
> >>          /* Install the new policy. */
> >> -       write_lock_irq(&state->ss->policy_rwlock);
> >> -       state->ss->policy = newpolicy;
> >> -       seqno = ++state->ss->latest_granting;
> >> -       write_unlock_irq(&state->ss->policy_rwlock);
> >> +       rcu_assign_pointer(state->policy, newpolicy);
> > This is probably a pre-existing thing, but I noticed that there is a
> > small window of inconsistency between policy (re)load / bool change
> > and AV cache. Between the rcu_assign_pointer() (or the write lock
> > section before) and avc_ss_reset(), there might be AVC lookups, which
> > in case of cache hit would decide based on the old policy, but in case
> > of a miss, they would decide based on the new one. I think it might be
> > fixed by moving avc_ss_reset() before rcu_assign_pointer() (but no
> > earlier than when it is guaranteed that the policy reload will
> > complete successfully).
> We can't move the avc_ss_reset() before we have installed the new policy
> or else the AVC will update to the new seqno but possibly still get avd
> entries with the old seqno on security_compute_av() calls.

If I understand the code correctly, then avc_latest_notif_update()
should reject inserts with old seqno, so it should become "frozen" so
from that point it should only receive the new access vectors.
Probably avc_flush() and avc_latest_notif_update() would also need to
be reordered to make sure no old entries are added in between, but
then it should be good.

> It is not
> unexpected that the AVC will keep seeing old decisions for a brief
> window; it is just as if they happened before the policy change.  Since
> we aren't providing a higher level synchronization in hooks.c to ensure
> that all permission checks made in a single hook (or at an even higher
> level, in a single system call) are done against the same policy, I
> don't think we gain anything by changing this. In any event, as you say,
> this is pre-existing and thus I don't want to change it as part of this
> patch.

It's definitely not unexpected to see (sequentially on one task) decisions like:

..., old, old, old, new, new, ...

but currently you could also get a pattern like:

..., old, old, new, old, new, new, ...

which I wouldn't say is expected. It is unlikely to be a big deal in
practice, but it just feels wrong...

Anyway, I concur that this is somewhat off-topic for this patch... I
raised it because I initially thought that it wasn't possible before
RCU conversion, only after, but then I realized it was possible before
as well.

> > Anyway, since we need to flush the AVC at each policy reload, it might
> > be more logical now to move the AVC under struct selinux_policy as
> > well, and avoid the seqno logic altogether. After the RCU conversion,
> > accessing a consistent struct selinux_policy becomes really cheap, so
> > integrating the AVC should allow for code simplification for (almost)
> > zero performance impact. A similar thing could be probably done also
> > with the netlabel/xfrm cache (not really familiar with those). Also
> > the duplicit policycap array in struct selinux_state could be replaced
> > with queries to struct selinux_policy instead (but I have a hunch
> > there are some pre-existing logical race conditions around the
> > handling of policycaps...).
>
> This might be possible in the kernel but the design was to permit the
> security server and AVC to potentially live in different components /
> address spaces, with the original implementation in a microkernel where
> the security server was a userspace task.  Even now, the security server
> is providing decisions to userspace policy enforcers with their own AVC
> (from libselinux) and therefore still needs a way to synchronize policy
> changes (it sends policyload notifications via netlink with the seqno,
> and /sys/fs/selinux/access provides the seqno for the decision). Some
> improvements are certainly possible here but again I'd make those separate.

Okay, I agree that this is out of scope, too. This loosely
synchronized system of kernel policy data + kernel AVC + userspace AVC
still feels weird to me, but I don't have a better solution at this
point, so I'll just have to live with it...

--
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.