On Thu, Nov 7, 2019 at 5:44 PM Will Deacon <will@xxxxxxxxxx> wrote: > On Thu, Nov 07, 2019 at 05:12:53PM +0100, Ondrej Mosnacek wrote: > > On Thu, Nov 7, 2019 at 11:17 AM Jeff Vander Stoep <jeffv@xxxxxxxxxx> wrote: > > > This replaces the reverse table lookup and reverse cache with a > > > hashtable which improves cache-miss reverse-lookup times from > > > O(n) to O(1)* and maintains the same performance as a reverse > > > cache hit. > > > > > > This reduces the time needed to add a new sidtab entry from ~500us > > > to 5us on a Pixel 3 when there are ~10,000 sidtab entries. > > > > > > The implementation uses the kernel's generic hashtable API, > > > It uses the context's string represtation as the hash source, > > > and the kernels generic string hashing algorithm full_name_hash() > > > to reduce the string to a 32 bit value. > > > > > > This change also maintains the improvement introduced in > > > commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve > > > performance") which removed the need to keep the current sidtab > > > locked during policy reload. It does however introduce periodic > > > locking of the target sidtab while converting the hashtable. Sidtab > > > entries are never modified or removed, so the context struct stored > > > in the sid_to_context tree can also be used for the context_to_sid > > > hashtable to reduce memory usage. > > > > > > This bug was reported by: > > > - On the selinux bug tracker. > > > BUG: kernel softlockup due to too many SIDs/contexts #37 > > > https://github.com/SELinuxProject/selinux-kernel/issues/37 > > > - Jovana Knezevic on Android's bugtracker. > > > Bug: 140252993 > > > "During multi-user performance testing, we create and remove users > > > many times. selinux_android_restorecon_pkgdir goes from 1ms to over > > > 20ms after about 200 user creations and removals. Accumulated over > > > ~280 packages, that adds a significant time to user creation, > > > making perf benchmarks unreliable." > > > > > > * Hashtable lookup is only O(1) when n < the number of buckets. > > > > > > Changes in V2: > > > -The hashtable uses sidtab_entry_leaf objects directly so these > > > objects are shared between the sid_to_context lookup tree and the > > > context_to_sid hashtable. This simplifies memory allocation and > > > was suggested by Ondrej Mosnacek. > > > -The new sidtab hash stats file in selinuxfs has been moved out of > > > the avc dir and into a new "ss" dir. > > > > > > V3: > > > -Add lock nesting notation. > > > > > > V4: > > > -Moved to *_rcu variants of the various hashtable functions > > > as suggested by Ondrej Mosnacek and Will Deacon. > > > > I may be misunderstanding the purpose of RCU, but isn't all this RCU > > stuff a big overkill when these entries are never removed? (Well, they > > are removed when the sidtab is being destroyed, at which point however > > no other threads are accessing them any more.) In my understanding, > > RCU basically makes sure that objects that you dereference in an RCU > > critical section are not destroyed until you leave it. Yes, it also > > helps you to dereference/update them safely, but that can be achieved > > on its own in a simpler way. Instead of using RCU here, I would > > instead suggest looking into adding an equivalent of > > list_for_each_entry_lockless()* for hlist and maybe introduce some > > suitable hlist_add_something() function that ensures consistency > > w.r.t. the lockless traversal (perhaps the WRITE_ONCE() in hlist_add() > > is already sufficient, but I'm not sure...). > > If you use the existing _rcu accessors you will get correctly enforced > dependency ordering on the reader side and correctly placed release > ordering on the updater side. I don't think that's a big overkill, and > you can use the RCU accessors to achieve the lockless traversal. OK, but you don't need the read_lock()/unlock(), rcu_head field in the entries, and kfree_rcu(), right? > > hlist_add() is not safe against a concurrent traversal because the > WRITE_ONCE() provides no memory ordering guarantees, allowing the readers > to see an uninitialised node. Right, so we would need a new function that does smp_store_release() instead. > > How exactly would list_for_each_entry_lockless() and hlist_add_something() > differ from the RCU variants, assuming they're implemented correctly? Looking at the implementation of rcu_dereference() and rcu_assign_pointer(), they would probably be practically the same, except the rcu_read_lock_held() check in rcu_dereference(). That and they would clearly communicate that they are not doing actual RCU, but just allow lockless traversal of an add-only hlist. -- Ondrej Mosnacek <omosnace at redhat dot com> Software Engineer, Security Technologies Red Hat, Inc.