On Mon, Nov 4, 2019 at 8:15 PM Jeff Vander Stoep <jeffv@xxxxxxxxxx> wrote: > This replaces the reverse table lookup and reverse cache with a > hashtable which improves cache-miss reverese-lookup times from > O(n) to O(1)* and maintains the same performance as a reverse > cache hit. > > This reduces the time needed to add a new sidtab entry from ~500us > to 5us on a Pixel 3 when there are ~10,000 sidtab entries. > > The implementation uses the kernel's generic hashtable API, > It uses the context's string represtation as the hash source, > and the kernels generic string hashing algorithm full_name_hash() > to reduce the string to a 32 bit value. > > This change also maintains the improvement introduced in commit > ee1a84fd which removed the need to keep the current sidtab locked > during policy reload. It does however introduce periodic locking of > the target sidtab while converting the hashtable. Sidtab entries > are never modified or removed, so the context struct stored in the > sid_to_context tree can also be used for the context_to_sid > hashtable to reduce memory usage. > > This bug was reported by: > - Stephen Smally on the selinux bug tracker. > BUG: kernel softlockup due to too many SIDs/contexts #37 > https://github.com/SELinuxProject/selinux-kernel/issues/37 > - Jovana Knezevic on Android's bugtracker. > Bug: 140252993 > "During multi-user performance testing, we create and remove users > many times. selinux_android_restorecon_pkgdir goes from 1ms to over > 20ms after about 200 user creations and removals. Accumulated over > ~280 packages, that adds a significant time to user creation, > making perf benchmarks unreliable." > > * Hashtable lookup is only O(1) when n < the number of buckets. > > Changes in V2: > -The hashtable uses sidtab_entry_leaf objects directly so these > objects are shared between the sid_to_context lookup tree and the > context_to_sid hashtable. > -Move the hash_add() statement to after the smp_store_release() > to ensure that the contents have been written before the object > becomes available in the hashtable. > -The new sidtab hash stats file in selinuxfs has been moved out of > the avc dir and into a new "ss" dir. > > Signed-off-by: Jeff Vander Stoep <jeffv@xxxxxxxxxx> > Reported-by: Stephen Smalley <sds@xxxxxxxxxxxxx> > Reported-by: Jovana Knezevic <jovanak@xxxxxxxxxx> > --- > security/selinux/Kconfig | 12 ++ > security/selinux/include/security.h | 1 + > security/selinux/selinuxfs.c | 64 +++++++ > security/selinux/ss/context.h | 11 +- > security/selinux/ss/policydb.c | 5 + > security/selinux/ss/services.c | 87 +++++++--- > security/selinux/ss/services.h | 4 +- > security/selinux/ss/sidtab.c | 249 +++++++++++++--------------- > security/selinux/ss/sidtab.h | 16 +- > 9 files changed, 287 insertions(+), 162 deletions(-) > > diff --git a/security/selinux/Kconfig b/security/selinux/Kconfig > index 5711689deb6a..2d9788ad2d77 100644 > --- a/security/selinux/Kconfig > +++ b/security/selinux/Kconfig > @@ -85,3 +85,15 @@ config SECURITY_SELINUX_CHECKREQPROT_VALUE > via /selinux/checkreqprot if authorized by policy. > > If you are unsure how to answer this question, answer 0. > + > +config SECURITY_SELINUX_SIDTAB_HASH_BITS > + int "NSA SELinux sidtab hashtable size" > + depends on SECURITY_SELINUX > + range 8 13 > + default 9 > + help > + This option sets the number of buckets used in the sitab hashtable > + to 2^SECURITY_SELINUX_SIDTAB_HASH_BITS buckets. The number of hash > + collisions may be viewed at /selinux/ss/sidtab_hash_stats. If chain > + lengths are high (e.g. > 20) than selecting a higher value here will s/than/then/ > + ensure that lookups times are fast and stable. lookup times are short and stable <snip> > diff --git a/security/selinux/ss/sidtab.c b/security/selinux/ss/sidtab.c > index 7d49994e8d5f..31cdb4848d67 100644 > --- a/security/selinux/ss/sidtab.c > +++ b/security/selinux/ss/sidtab.c > @@ -17,26 +17,37 @@ > #include "security.h" > #include "sidtab.h" > > +#define index_to_sid(index) (index + SECINITSID_NUM + 1) > +#define sid_to_index(sid) (sid - (SECINITSID_NUM + 1)) > + > int sidtab_init(struct sidtab *s) > { > u32 i; > > memset(s->roots, 0, sizeof(s->roots)); > > - /* max count is SIDTAB_MAX so valid index is always < SIDTAB_MAX */ > - for (i = 0; i < SIDTAB_RCACHE_SIZE; i++) > - s->rcache[i] = SIDTAB_MAX; > - > for (i = 0; i < SECINITSID_NUM; i++) > s->isids[i].set = 0; > > s->count = 0; > s->convert = NULL; > + hash_init(s->context_to_sid); > > spin_lock_init(&s->lock); > return 0; > } > > +static u32 context_to_sid(struct sidtab *s, struct context *context) > +{ > + struct sidtab_entry_leaf *entry; > + > + hash_for_each_possible(s->context_to_sid, entry, list, context->hash) { > + if (context_cmp(&entry->context, context)) > + return entry->sid; For this to be semantically safe against entries being added, it would need the hlist head -> first member to be read with smp_load_acquire(). Probably in practice you would always get fresh data when dereferencing a pointer, but I don't think we should rely on that... In fact it looks like hash_for_each_possible() wasn't really written with lockless traversal in mind - I checked a few random places that call it and they all either do it under some lock or expect the table to be immutable. I believe you will need to introduce a new equivalents of hash_add()/hlist_add_head() (that does smp_store_release() instead of WRITE_ONCE(), because the 'next' pointer of the node being added must be written before the head's first pointer is set to point to the new node) and hash_for_each_possible()/hlist_for_each_entry() (that loads (head)->first with smp_load_acquire()). Then again, I'm not an expert on this synchronization stuff, so I might be wrong... But you will have to prove it :) > + } > + return 0; > +} > + I am a little uneasy about this function having the same name as the context_to_sid() in security.c. It could be confusing when debugging. Maybe you could name the security.c function to context_struct_to_sid()? > int sidtab_set_initial(struct sidtab *s, u32 sid, struct context *context) > { > struct sidtab_isid_entry *entry; <snip> > @@ -305,29 +275,35 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context, > rc = -ENOMEM; > dst_convert = sidtab_do_lookup(convert->target, count, 1); > if (!dst_convert) { > - context_destroy(dst); > + context_destroy(&dst->context); > goto out_unlock; > } > > - rc = convert->func(context, dst_convert, convert->args); > + rc = convert->func(context, &dst_convert->context, > + convert->args); > if (rc) { > - context_destroy(dst); > + context_destroy(&dst->context); > goto out_unlock; > } > + dst_convert->sid = index_to_sid(count); > > /* at this point we know the insert won't fail */ > + spin_lock_irqsave(&convert->target->lock, flags); > convert->target->count = count + 1; > + hash_add(convert->target->context_to_sid, > + &dst_convert->list, dst_convert->context.hash); > + spin_unlock_irqrestore(&convert->target->lock, flags); Do we really need to lock the conversion target here? The (undocumented) contract here was that as long as the conversion is in progress, the target sidtab is "owned" by the master sidtab and any modifications of it would be done under the master's spinlock (the target's spinlock wouldn't be used until it is swapped-in to be the new master). Did this contract change with your patch? I think this nested locking is the cause of the lock warning that Stephen is getting so hopefully you can fix it by simply removing this extra locking. <snip> -- Ondrej Mosnacek <omosnace at redhat dot com> Software Engineer, Security Technologies Red Hat, Inc.