On Tue, Apr 6, 2021 at 12:11 AM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > On Mon, Apr 5, 2021 at 5:11 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote: > > > > Since commit 1b8b31a2e612 ("selinux: convert policy read-write lock to > > RCU"), there is a small window during policy load where the new policy > > pointer has already been installed, but some threads may still be > > holding the old policy pointer in their read-side RCU critical sections. > > This means that there may be conflicting attempts to add a new SID entry > > to both tables via sidtab_context_to_sid(). > > > > See also (and the rest of the thread): > > https://lore.kernel.org/selinux/CAFqZXNvfux46_f8gnvVvRYMKoes24nwm2n3sPbMjrB8vKTW00g@xxxxxxxxxxxxxx/ > > > > Fix this by installing the new policy pointer under the old sidtab's > > spinlock along with marking the old sidtab as "frozen". Then, if an > > attempt to add new entry to a "frozen" sidtab is detected, make > > sidtab_context_to_sid() return -ESTALE to indicate that a new policy > > has been installed and that the caller will have to abort the policy > > transaction and try again after re-taking the policy pointer (which is > > guaranteed to be a newer policy). This requires adding a retry-on-ESTALE > > logic to all callers of sidtab_context_to_sid(), but fortunately these > > are easy to determine and aren't that many. > > > > This seems to be the simplest solution for this problem, even if it > > looks somewhat ugly. Note that other places in the kernel (e.g. > > do_mknodat() in fs/namei.c) use similar stale-retry patterns, so I think > > it's reasonable. > > > > Fixes: 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU") > > Signed-off-by: Ondrej Mosnacek <omosnace@xxxxxxxxxx> > > --- > > security/selinux/ss/services.c | 145 ++++++++++++++++++++++++++------- > > security/selinux/ss/sidtab.c | 21 +++++ > > security/selinux/ss/sidtab.h | 4 + > > 3 files changed, 139 insertions(+), 31 deletions(-) > > I'm glad to see the retry approach here, that looks like a good > solution to me. I did have a few comments below. > > > diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c > > index 08ac1d01c743..50abcfcdf242 100644 > > --- a/security/selinux/ss/services.c > > +++ b/security/selinux/ss/services.c > > @@ -1736,6 +1746,7 @@ static int security_compute_sid(struct selinux_state *state, > > goto out; > > } > > > > +retry: > > context_init(&newcontext); > > > > rcu_read_lock(); > > @@ -1880,6 +1891,11 @@ static int security_compute_sid(struct selinux_state *state, > > } > > /* Obtain the sid for the context. */ > > rc = sidtab_context_to_sid(sidtab, &newcontext, out_sid); > > + if (rc == -ESTALE) { > > + rcu_read_unlock(); > > + context_destroy(&newcontext); > > Do we also need to reset 'cladatum' to NULL here as well? Yep, good catch! > > > + goto retry; > > + } > > out_unlock: > > rcu_read_unlock(); > > context_destroy(&newcontext); > > ... > > > @@ -2510,7 +2551,7 @@ int security_netif_sid(struct selinux_state *state, > > struct selinux_policy *policy; > > struct policydb *policydb; > > struct sidtab *sidtab; > > - int rc = 0; > > + int rc; > > struct ocontext *c; > > > > if (!selinux_initialized(state)) { > > @@ -2518,6 +2559,8 @@ int security_netif_sid(struct selinux_state *state, > > return 0; > > } > > > > +retry: > > + rc = 0; > > rcu_read_lock(); > > policy = rcu_dereference(state->policy); > > policydb = &policy->policydb; > > @@ -2534,10 +2577,18 @@ int security_netif_sid(struct selinux_state *state, > > if (!c->sid[0] || !c->sid[1]) { > > rc = sidtab_context_to_sid(sidtab, &c->context[0], > > &c->sid[0]); > > + if (rc == -ESTALE) { > > + rcu_read_unlock(); > > + goto retry; > > + } > > if (rc) > > goto out; > > rc = sidtab_context_to_sid(sidtab, &c->context[1], > > &c->sid[1]); > > + if (rc == -ESTALE) { > > + rcu_read_unlock(); > > + goto retry; > > + } > > If the first sidtab_context_to_sid() ran successfully we are not going > to see -ESTALE here, correct? Assuming yes, perhaps we can drop this > -ESTALE check with a comment about how a frozen sidtab will be caught > by the call above. No, nothing really prevents the sidtab from becoming frozen between the two calls. > > > if (rc) > > goto out; > > } > > ... > > > @@ -2676,18 +2732,20 @@ int security_get_user_sids(struct selinux_state *state, > > struct sidtab *sidtab; > > struct context *fromcon, usercon; > > u32 *mysids = NULL, *mysids2, sid; > > - u32 mynel = 0, maxnel = SIDS_NEL; > > + u32 i, j, mynel, maxnel = SIDS_NEL; > > struct user_datum *user; > > struct role_datum *role; > > struct ebitmap_node *rnode, *tnode; > > - int rc = 0, i, j; > > + int rc; > > > > *sids = NULL; > > *nel = 0; > > > > if (!selinux_initialized(state)) > > - goto out; > > + return 0; > > > > +retry: > > + mynel = 0; > > rcu_read_lock(); > > policy = rcu_dereference(state->policy); > > policydb = &policy->policydb; > > @@ -2723,6 +2781,10 @@ int security_get_user_sids(struct selinux_state *state, > > continue; > > > > rc = sidtab_context_to_sid(sidtab, &usercon, &sid); > > + if (rc == -ESTALE) { > > + rcu_read_unlock(); > > Don't we need to free and reset 'mysids' here? 'mynel' and 'maxnel' > are probably less critical since the -ESTALE condition should happen > before they are ever modified, but a constant assignment is pretty > cheap so it may make sense to reset those as well. No, we can keep "mysids" and "maxnel", since they represent an automatically growing vector, which we can keep and reuse in the retry path rather than starting from scratch. It is more efficient, since the new policy will likely have the same number of SIDs in the result. If it has more, we just grow the vector further as needed; if it has less, we just don't use the full capacity and the array will be freed after a while anyway (see "out_unlock"), so the extra memory shouldn't be held for too long. Not to mention that this is a deprecated interface that will hopefully be removed one day :) (And you're wrong that mynel/maxnel can't be modified - notice that the sidtab_context_to_sid() call is inside a loop ;) > > > + goto retry; > > + } > > if (rc) > > goto out_unlock; > > if (mynel < maxnel) { > > @@ -2745,14 +2807,14 @@ out_unlock: > > rcu_read_unlock(); > > if (rc || !mynel) { > > kfree(mysids); > > - goto out; > > + return rc; > > } > > > > rc = -ENOMEM; > > mysids2 = kcalloc(mynel, sizeof(*mysids2), GFP_KERNEL); > > if (!mysids2) { > > kfree(mysids); > > - goto out; > > + return rc; > > } > > for (i = 0, j = 0; i < mynel; i++) { > > struct av_decision dummy_avd; > > @@ -2765,12 +2827,10 @@ out_unlock: > > mysids2[j++] = mysids[i]; > > cond_resched(); > > } > > - rc = 0; > > kfree(mysids); > > *sids = mysids2; > > *nel = j; > > -out: > > - return rc; > > + return 0; > > } > > > > /** > > -- > paul moore > www.paul-moore.com > -- Ondrej Mosnacek Software Engineer, Linux Security - SELinux kernel Red Hat, Inc.