Re: [PATCH] selinux: sidtab: reverse lookup hash table

Paul Moore <paul@xxxxxxxxxxxxxx> · Wed, 30 Oct 2019 16:48:33 -0400

On Wed, Oct 30, 2019 at 4:06 PM Jeffrey Vander Stoep <jeffv@xxxxxxxxxx> wrote:
> On Wed, Oct 30, 2019 at 9:02 PM Stephen Smalley <sds@xxxxxxxxxxxxx> wrote:
> > On 10/30/19 3:07 PM, Jeffrey Vander Stoep wrote:
> > > On Wed, Oct 30, 2019 at 7:57 PM Stephen Smalley <sds@xxxxxxxxxxxxx> wrote:
> > >>
> > >> On 10/30/19 2:33 PM, Jeffrey Vander Stoep wrote:
> > >>> On Wed, Oct 30, 2019 at 7:06 PM Stephen Smalley <sds@xxxxxxxxxxxxx> wrote:
> > >>>>
> > >>>> On 10/30/19 6:19 AM, Jeff Vander Stoep wrote:
> > >>>>> This replaces the reverse table lookup and reverse cache with a
> > >>>>> hashtable which improves cache-miss reverese-lookup times from
> > >>>>> O(n) to O(1) and maintains the same performance as a reverse
> > >>>>> cache hit.
> > >>>>>
> > >>>>> This reduces the time needed to add a new sidtab entry from ~500us
> > >>>>> to 5us on a Pixel 3 when there are ~10,000 sidtab entries.
> > >>>>>
> > >>>>> The implementation uses the kernel's generic hashtable API,
> > >>>>> It uses the context's string represtation as the hash source,
> > >>>>> and the kernels generic string hashing algorithm full_name_hash()
> > >>>>> to reduce the string to a 32 bit value.
> > >>>>>
> > >>>>> This change also maintains the improvement introduced in commit
> > >>>>> ee1a84fd which removed the need to keep the current sidtab locked
> > >>>>> during policy reload. It does however introduce periodic locking of
> > >>>>> the target sidtab while converting the hashtable. Sidtab entries
> > >>>>> are never modified or removed, so the context struct stored in the
> > >>>>> sid_to_context tree can also be used for the context_to_sid
> > >>>>> hashtable to reduce memory usage.
> > >>>>>
> > >>>>> This bug was reported by:
> > >>>>> - Stephen Smally on the selinux bug tracker.
> > >>>>>      BUG: kernel softlockup due to too many SIDs/contexts #37
> > >>>>>      https://github.com/SELinuxProject/selinux-kernel/issues/37
> > >>>>> - Jovana Knezevic on Android's bugtracker.
> > >>>>>      Bug: 140252993
> > >>>>>      "During multi-user performance testing, we create and remove users
> > >>>>>      many times. selinux_android_restorecon_pkgdir goes from 1ms to over
> > >>>>>      20ms after about 200 user creations and removals. Accumulated over
> > >>>>>      ~280 packages, that adds a significant time to user creation,
> > >>>>>      making perf benchmarks unreliable."
> > >>>>>
> > >>>>> Signed-off-by: Jeff Vander Stoep <jeffv@xxxxxxxxxx>
> > >>>>> Reported-by: Stephen Smalley <sds@xxxxxxxxxxxxx>
> > >>>>> Reported-by: Jovana Knezevic <jovanak@xxxxxxxxxx>
> > >>>>> ---
> > >>>>>     security/selinux/include/security.h |   1 +
> > >>>>>     security/selinux/selinuxfs.c        |  27 +++
> > >>>>>     security/selinux/ss/context.h       |   9 +
> > >>>>>     security/selinux/ss/policydb.c      |   5 +
> > >>>>>     security/selinux/ss/services.c      |  81 +++++---
> > >>>>>     security/selinux/ss/services.h      |   4 +-
> > >>>>>     security/selinux/ss/sidtab.c        | 283 ++++++++++++++++------------
> > >>>>>     security/selinux/ss/sidtab.h        |  20 +-
> > >>>>>     8 files changed, 283 insertions(+), 147 deletions(-)

...

> > >>>>> diff --git a/security/selinux/ss/sidtab.c b/security/selinux/ss/sidtab.c
> > >>>>> index 7d49994e8d5f..e4710f32b6d9 100644
> > >>>>> --- a/security/selinux/ss/sidtab.c
> > >>>>> +++ b/security/selinux/ss/sidtab.c
> > >>>>> @@ -23,23 +23,32 @@ int sidtab_init(struct sidtab *s)
> > >>>>>
> > >>>>>         memset(s->roots, 0, sizeof(s->roots));
> > >>>>>
> > >>>>> -     /* max count is SIDTAB_MAX so valid index is always < SIDTAB_MAX */
> > >>>>> -     for (i = 0; i < SIDTAB_RCACHE_SIZE; i++)
> > >>>>> -             s->rcache[i] = SIDTAB_MAX;
> > >>>>> -
> > >>>>>         for (i = 0; i < SECINITSID_NUM; i++)
> > >>>>>                 s->isids[i].set = 0;
> > >>>>>
> > >>>>>         s->count = 0;
> > >>>>>         s->convert = NULL;
> > >>>>> +     hash_init(s->context_to_sid);
> > >>>>>
> > >>>>>         spin_lock_init(&s->lock);
> > >>>>>         return 0;
> > >>>>>     }
> > >>>>>
> > >>>>> +static u32 context_to_sid(struct sidtab *s, struct context *context)
> > >>>>> +{
> > >>>>> +     struct sidtab_node *node;
> > >>>>> +
> > >>>>> +     hash_for_each_possible(s->context_to_sid, node, list, context->hash) {
> > >>>>
> > >>>> Is this safe without any locking or any kind of memory barrier pair to
> > >>>> deal with hash_add() calls?
> > >>>
> > >>> My understanding is that it is because hlist_add_head is written to be
> > >>> safe for this type of lockless access.
> > >>>
> > >>> It performs all modifications on the new node first and only then uses
> > >>> WRITE_ONCE() to write the list head's ->next pointer atomically and prevent
> > >>> reordering.
> > >>>
> > >>> (similar explanation here: https://lore.kernel.org/patchwork/patch/624316/)
> > >>>
> > >>> "Code that does lockless emptiness testing of non-RCU lists is relying
> > >>> on INIT_LIST_HEAD() to write the list head's ->next pointer atomically,
> > >>> particularly when INIT_LIST_HEAD() is invoked from list_del_init().
> > >>> This commit therefore adds WRITE_ONCE() to this function's pointer stores
> > >>> that could affect the head's ->next pointer."
> > >>
> > >> Wondering if we still need at least a write barrier in the code that
> > >> adds a node prior to hash_add() to ensure the node contents are sane.
> > >
> > > I'm not sure what you mean. Are you concerned about compiler reordering?
> >
> > Compiler or processor.  Are we guaranteed that node->sid and
> > node->context are sane before the node is visible to the readers?  The
> > old sidtab code (before Ondrej's rewrite) used to issue a full wmb()
> > after setting the node->sid, node->context, and node->next prior to
> > linking the node into the list.
>
> Got it, thanks. I'll look into that and include it in v2 if necessary.

I'm trying to get caught up after being away for a bit on vacation,
but after skimming this thread it sounds like a v2 is necessary.

Also - and completely unrelated to the code changes - can people start
putting some more care into trimming their email replies?  It can be
painful to scroll through big patches looking for the actual comments,
and the chance of missing something important can be high.

-- 
paul moore
www.paul-moore.com