On Thu, Nov 7, 2019 at 3:54 PM Stephen Smalley <sds@xxxxxxxxxxxxx> wrote: > On 11/7/19 5:17 AM, Jeff Vander Stoep wrote: > > This replaces the reverse table lookup and reverse cache with a > > hashtable which improves cache-miss reverse-lookup times from > > O(n) to O(1)* and maintains the same performance as a reverse > > cache hit. > > > > This reduces the time needed to add a new sidtab entry from ~500us > > to 5us on a Pixel 3 when there are ~10,000 sidtab entries. > > > > The implementation uses the kernel's generic hashtable API, > > It uses the context's string represtation as the hash source, > > and the kernels generic string hashing algorithm full_name_hash() > > to reduce the string to a 32 bit value. > > > > This change also maintains the improvement introduced in > > commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve > > performance") which removed the need to keep the current sidtab > > locked during policy reload. It does however introduce periodic > > locking of the target sidtab while converting the hashtable. Sidtab > > entries are never modified or removed, so the context struct stored > > in the sid_to_context tree can also be used for the context_to_sid > > hashtable to reduce memory usage. > > > > This bug was reported by: > > - On the selinux bug tracker. > > BUG: kernel softlockup due to too many SIDs/contexts #37 > > https://github.com/SELinuxProject/selinux-kernel/issues/37 > > - Jovana Knezevic on Android's bugtracker. > > Bug: 140252993 > > "During multi-user performance testing, we create and remove users > > many times. selinux_android_restorecon_pkgdir goes from 1ms to over > > 20ms after about 200 user creations and removals. Accumulated over > > ~280 packages, that adds a significant time to user creation, > > making perf benchmarks unreliable." > > > > * Hashtable lookup is only O(1) when n < the number of buckets. > > > > Changes in V2: > > -The hashtable uses sidtab_entry_leaf objects directly so these > > objects are shared between the sid_to_context lookup tree and the > > context_to_sid hashtable. This simplifies memory allocation and > > was suggested by Ondrej Mosnacek. > > -The new sidtab hash stats file in selinuxfs has been moved out of > > the avc dir and into a new "ss" dir. > > > > V3: > > -Add lock nesting notation. > > > > V4: > > -Moved to *_rcu variants of the various hashtable functions > > as suggested by Ondrej Mosnacek and Will Deacon. > > -Naming/spelling fixups. > > This still triggers a splat for me in running Ondrej's reproducer (also > ran selinux-testsuite first, not sure if that mattered): > > SELinux: Converting 3784 SID table entries... > kernel: ------------[ cut here ]------------ > syscall 1 left IRQs disabled > WARNING: CPU: 1 PID: 8313 at arch/x86/entry/common.c:261 > syscall_return_slowpathModules linked in: crypto_user > scsi_transport_iscsi xt_multiport bluetooth ecdh_kernel: crc32_pclmul > snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel sCPU: 1 PID: > 8313 Comm: runcon Not tainted 5.4.0-rc1+ #57 > Hardware name: Dell Inc. OptiPlex 7050/0NW6H5, BIOS 1.8.3 03/23/2018 > RIP: 0010:syscall_return_slowpath+0x1e1/0x4a0 > Code: ff ff e9 60 ff ff ff e8 2d ad 05 00 e9 2a ff ff ff 48 8d 7d 78 e8 > cf cf 50RSP: 0018:ffff8885a6adff28 EFLAGS: 00010082 > RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 > RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffffed10b4d5bfd7 > RBP: ffff8885a6adff58 R08: ffffffffaa219f90 R09: fffffbfff5883615 > R10: fffffbfff5883614 R11: ffffffffac41b0a3 R12: 0000000000000000 > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > FS: 00007f3e32c67080(0000) GS:ffff888822a40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f3e32f145d0 CR3: 00000005a7476005 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x7f3e32e11467 > Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e > fa 64 8bRSP: 002b:00007ffec6482bc8 EFLAGS: 00000246 ORIG_RAX: > 0000000000000001 > RAX: 000000000000002f RBX: 00007ffec6483d88 RCX: 00007f3e32e11467 > RDX: 000000000000002f RSI: 000055b75e5a5600 RDI: 0000000000000003 > RBP: 000055b75e5a5600 R08: 00000000ffffffff R09: 00007ffec6482a60 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > R13: 00007ffec6483c4c R14: 0000000000000000 R15: 0000000000000000 > irq event stamp: 7292 > hardirqs last enabled at (7291): [<ffffffffab26d9bb>] > _raw_write_unlock_irqresthardirqs last disabled at (7292): > [<ffffffffab26db61>] _raw_spin_lock_irqsave+0xsoftirqs last enabled at > (7036): [<ffffffffab600494>] __do_softirq+0x494/0x665 > softirqs last disabled at (7025): [<ffffffffaa168b8f>] irq_exit+0x13f/0x160 > kernel: ---[ end trace 1557497a64e0b001 ]--- > kernel: ------------[ cut here ]------------ > > Some comments below. Recommend running it by the RCU wizards e.g. Paul > McKenney too given the subtleties of getting RCU right. > > <snip> > > > diff --git a/security/selinux/ss/sidtab.c b/security/selinux/ss/sidtab.c > > index 7d49994e8d5f..c052e620246c 100644 > > --- a/security/selinux/ss/sidtab.c > > +++ b/security/selinux/ss/sidtab.c > > @@ -17,26 +17,42 @@ > <snip> > > +static u32 context_to_sid(struct sidtab *s, struct context *context) > > +{ > > + struct sidtab_entry_leaf *entry; > > + > > + rcu_read_lock(); > > + hash_for_each_possible_rcu(s->context_to_sid, entry, list, > > + context->hash) { > > + if (context_cmp(&entry->context, context)) { > > + rcu_read_unlock(); > > + return entry->sid; > > This looks unsafe to me; I would have assumed you need to at least save > entry->sid to a temporary before rcu_read_unlock()? > > > @@ -305,29 +284,36 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context, > > rc = -ENOMEM; > > dst_convert = sidtab_do_lookup(convert->target, count, 1); > > if (!dst_convert) { > > - context_destroy(dst); > > + context_destroy(&dst->context); > > goto out_unlock; > > } > > > > - rc = convert->func(context, dst_convert, convert->args); > > + rc = convert->func(context, &dst_convert->context, > > + convert->args); > > if (rc) { > > - context_destroy(dst); > > + context_destroy(&dst->context); > > goto out_unlock; > > } > > + dst_convert->sid = index_to_sid(count); > > + convert->target->count = count + 1; > > > > /* at this point we know the insert won't fail */ > > - convert->target->count = count + 1; > > + spin_lock_irqsave_nested(&convert->target->lock, flags, > > + SINGLE_DEPTH_NESTING); > > + hash_add_rcu(convert->target->context_to_sid, > > + &dst_convert->list, dst_convert->context.hash); > > + spin_unlock_irqrestore(&convert->target->lock, flags); > > Still having a hard time understanding why we need to lock the target. > The target here is always the newsidtab allocated by > security_load_policy(), not yet exposed to any other threads in the system? I believe this is to avoid a race with the hash_add() in sidtab_convert_hashtable(), which locks only the target table. However, I don't like the fact that we need to mess with inconsistent lock nesting here... I think it would be better to just lock the master's spinlock in sidtab_convert_hashtable() - it will then block more when entries would be added to sidtab, but that shouldn't be a big concern in practice. -- Ondrej Mosnacek <omosnace at redhat dot com> Software Engineer, Security Technologies Red Hat, Inc.