Re: [PATCH v3] selinux: cache the SID -> context string translation

Ondrej Mosnacek <omosnace@xxxxxxxxxx> · Mon, 11 Nov 2019 09:32:50 +0100

On Sun, Nov 10, 2019 at 10:03 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> On Fri, Nov 08, 2019 at 09:39:09AM +0100, Ondrej Mosnacek wrote:
> > +static void sidtab_destroy_entry(struct sidtab_entry *entry)
> > +{
> > +     context_destroy(&entry->context);
> > +#if CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE > 0
> > +     kfree(rcu_dereference_raw(entry->cache));
> > +#endif
> > +}
>
> I am assuming that this is called after all possible readers are done,
> for example late in shutdown.  Or is there some way that a grace period
> is guaranteed to have elapsed between the time this data was made
> inaccessible to readers and when sidtab_destroy_entry() is invoked?
>
> For example, suppose that sidtab_sid2str_get() is delayed just after
> it fetches entry->cache.  How does this patch avoid freeing entry->cache
> out from under that call to sidtab_sid2str_get()?

It is not visible in this patch, but the sidtab (along with other
policy-lifetime structures) is protected by a big fat read-write lock.
The only places where sidtab_destroy() is called are (a) error paths
when initializing a new sidtab (here the sidtab isn't shared yet, so
no race) and (b) when freeing the old sidtab during policy reload - in
this case it is happening after a policy write-locked critical
section, which had removed the old sidtab pointer from the shared
structures, so at that point all sidtab readers will already be
accessing the new sidtab and the old one is visible only by the thread
doing the destruction.

>
> > +
> >  static void sidtab_destroy_tree(union sidtab_entry_inner entry, u32 level)
> >  {
> >       u32 i;
> > @@ -473,7 +509,7 @@ static void sidtab_destroy_tree(union sidtab_entry_inner entry, u32 level)
> >                       return;
> >
> >               for (i = 0; i < SIDTAB_LEAF_ENTRIES; i++)
> > -                     context_destroy(&node->entries[i].context);
> > +                     sidtab_destroy_entry(&node->entries[i]);
> >               kfree(node);
> >       }
> >  }
> > @@ -484,7 +520,7 @@ void sidtab_destroy(struct sidtab *s)
> >
> >       for (i = 0; i < SECINITSID_NUM; i++)
> >               if (s->isids[i].set)
> > -                     context_destroy(&s->isids[i].context);
> > +                     sidtab_destroy_entry(&s->isids[i].entry);
> >
> >       level = SIDTAB_MAX_LEVEL;
> >       while (level && !s->roots[level].ptr_inner)
> > @@ -492,3 +528,88 @@ void sidtab_destroy(struct sidtab *s)
> >
> >       sidtab_destroy_tree(s->roots[level], level);
> >  }
> > +
> > +#if CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE > 0
> > +
> > +void sidtab_sid2str_put(struct sidtab *s, struct sidtab_entry *entry,
> > +                     const char *str, u32 str_len)
> > +{
> > +     struct sidtab_str_cache *cache, *victim;
> > +
> > +     /* do not cache invalid contexts */
> > +     if (entry->context.len)
> > +             return;
> > +
> > +     /*
> > +      * Skip the put operation when in non-task context to avoid the need
> > +      * to disable interrupts while holding s->cache_lock.
> > +      */
> > +     if (!in_task())
> > +             return;
> > +
> > +     spin_lock(&s->cache_lock);
> > +
> > +     cache = rcu_dereference_raw(entry->cache);
>
> If entry->cache is protected by s->cache_lock, you can do this:
>
>         cache = rcu_dereference_protected(entry->cache, lockdep_is_held(s->cache_lock);
>
> Yes, this looks a little silly just after the lock is acquired, but
> future code changes might make the connection less obvious.  And there
> is no additional overhead unless CONFIG_PROVE_LOCKING is set, which
> won't normally be set in production.

Yeah, I realized this shortly after sending the patch. Queued this
change for v4.

>
> > +     if (cache) {
> > +             /* entry in cache - just bump to he head of LRU list */

Aaand I just spotted a typo here (he -> the). Also queued for v4.

> > +             list_move(&cache->lru_member, &s->cache_lru_list);
> > +             goto out_unlock;
> > +     }
> > +
> > +     cache = kmalloc(sizeof(struct sidtab_str_cache) + str_len, GFP_ATOMIC);
> > +     if (!cache)
> > +             goto out_unlock;
> > +
> > +     if (s->cache_free_slots == 0) {
> > +             /* pop a cache entry from the tail and free it */
> > +             victim = container_of(s->cache_lru_list.prev,
> > +                                   struct sidtab_str_cache, lru_member);
> > +             list_del(&victim->lru_member);
> > +             kfree_rcu(victim, rcu_member);
> > +             rcu_assign_pointer(victim->parent->cache, NULL);
> > +     } else {
> > +             s->cache_free_slots--;
> > +     }
> > +     cache->parent = entry;
> > +     cache->len = str_len;
> > +     memcpy(cache->str, str, str_len);
> > +     rcu_head_init(&cache->rcu_member);
>
> You don't need rcu_head_init() unless you are also going to be using
> rcu_head_after_call_rcu(), which I don't see in this patch.  There is
> only one use of rcu_head_after_call_rcu() compared to something like 300
> uses of call_rcu(), so you probably won't ever need the rcu_head_init().

Ah, right. I just instinctively tried to keep everything initialized
and since I saw that rcu_head_init() exists I figured I should better
call it here... But it makes sense that rcu_member won't be touched
until call_rcu(), which can just initialize it itself. I'll drop the
call in v4.

>
> > +     list_add(&cache->lru_member, &s->cache_lru_list);
> > +
> > +     rcu_assign_pointer(entry->cache, cache);
> > +
> > +out_unlock:
> > +     spin_unlock(&s->cache_lock);
> > +}
> > +
> > +int sidtab_sid2str_get(struct sidtab *s, struct sidtab_entry *entry,
> > +                    char **out, u32 *out_len)
> > +{
> > +     struct sidtab_str_cache *cache;
> > +     int rc = 0;
> > +
> > +     if (entry->context.len)
> > +             return -ENOENT; /* do not cache invalid contexts */
> > +
> > +     rcu_read_lock();
> > +
> > +     cache = rcu_dereference(entry->cache);
> > +     if (!cache) {
> > +             rc = -ENOENT;
> > +     } else {
> > +             *out_len = cache->len;
>
> Interesting.  The reason the above assignment is outside of the "if"
> below is so that the caller can learn the length by passing in NULL
> for "out" and non-NULL for "out_len"?

Yes, I was mirroring the logic of context_struct_to_string() (the
target of the caching), which seems to intentionally support such
feature. I'm not sure if it is ever used, but it's safer not to break
the contract.

Thank you for the review, much appreciated!

>
>                                                         Thanx, Paul
>
> > +             if (out) {
> > +                     *out = kmemdup(cache->str, cache->len, GFP_ATOMIC);
> > +                     if (!*out)
> > +                             rc = -ENOMEM;
> > +             }
> > +     }

--
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.