On Thu, Nov 07, 2019 at 05:12:53PM +0100, Ondrej Mosnacek wrote: > On Thu, Nov 7, 2019 at 11:17 AM Jeff Vander Stoep <jeffv@xxxxxxxxxx> wrote: > > This replaces the reverse table lookup and reverse cache with a > > hashtable which improves cache-miss reverse-lookup times from > > O(n) to O(1)* and maintains the same performance as a reverse > > cache hit. > > > > This reduces the time needed to add a new sidtab entry from ~500us > > to 5us on a Pixel 3 when there are ~10,000 sidtab entries. > > > > The implementation uses the kernel's generic hashtable API, > > It uses the context's string represtation as the hash source, > > and the kernels generic string hashing algorithm full_name_hash() > > to reduce the string to a 32 bit value. > > > > This change also maintains the improvement introduced in > > commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve > > performance") which removed the need to keep the current sidtab > > locked during policy reload. It does however introduce periodic > > locking of the target sidtab while converting the hashtable. Sidtab > > entries are never modified or removed, so the context struct stored > > in the sid_to_context tree can also be used for the context_to_sid > > hashtable to reduce memory usage. > > > > This bug was reported by: > > - On the selinux bug tracker. > > BUG: kernel softlockup due to too many SIDs/contexts #37 > > https://github.com/SELinuxProject/selinux-kernel/issues/37 > > - Jovana Knezevic on Android's bugtracker. > > Bug: 140252993 > > "During multi-user performance testing, we create and remove users > > many times. selinux_android_restorecon_pkgdir goes from 1ms to over > > 20ms after about 200 user creations and removals. Accumulated over > > ~280 packages, that adds a significant time to user creation, > > making perf benchmarks unreliable." > > > > * Hashtable lookup is only O(1) when n < the number of buckets. > > > > Changes in V2: > > -The hashtable uses sidtab_entry_leaf objects directly so these > > objects are shared between the sid_to_context lookup tree and the > > context_to_sid hashtable. This simplifies memory allocation and > > was suggested by Ondrej Mosnacek. > > -The new sidtab hash stats file in selinuxfs has been moved out of > > the avc dir and into a new "ss" dir. > > > > V3: > > -Add lock nesting notation. > > > > V4: > > -Moved to *_rcu variants of the various hashtable functions > > as suggested by Ondrej Mosnacek and Will Deacon. > > I may be misunderstanding the purpose of RCU, but isn't all this RCU > stuff a big overkill when these entries are never removed? (Well, they > are removed when the sidtab is being destroyed, at which point however > no other threads are accessing them any more.) In my understanding, > RCU basically makes sure that objects that you dereference in an RCU > critical section are not destroyed until you leave it. Yes, it also > helps you to dereference/update them safely, but that can be achieved > on its own in a simpler way. Instead of using RCU here, I would > instead suggest looking into adding an equivalent of > list_for_each_entry_lockless()* for hlist and maybe introduce some > suitable hlist_add_something() function that ensures consistency > w.r.t. the lockless traversal (perhaps the WRITE_ONCE() in hlist_add() > is already sufficient, but I'm not sure...). If you use the existing _rcu accessors you will get correctly enforced dependency ordering on the reader side and correctly placed release ordering on the updater side. I don't think that's a big overkill, and you can use the RCU accessors to achieve the lockless traversal. hlist_add() is not safe against a concurrent traversal because the WRITE_ONCE() provides no memory ordering guarantees, allowing the readers to see an uninitialised node. How exactly would list_for_each_entry_lockless() and hlist_add_something() differ from the RCU variants, assuming they're implemented correctly? Will