Re: [PATCH v5] selinux: sidtab: reverse lookup hash table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/7/19 5:17 AM, Jeff Vander Stoep wrote:
This replaces the reverse table lookup and reverse cache with a
hashtable which improves cache-miss reverse-lookup times from
O(n) to O(1)* and maintains the same performance as a reverse
cache hit.

This reduces the time needed to add a new sidtab entry from ~500us
to 5us on a Pixel 3 when there are ~10,000 sidtab entries.

The implementation uses the kernel's generic hashtable API,
It uses the context's string represtation as the hash source,
and the kernels generic string hashing algorithm full_name_hash()
to reduce the string to a 32 bit value.

This change also maintains the improvement introduced in
commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve
performance") which removed the need to keep the current sidtab
locked during policy reload. It does however introduce periodic
locking of the target sidtab while converting the hashtable. Sidtab
entries are never modified or removed, so the context struct stored
in the sid_to_context tree can also be used for the context_to_sid
hashtable to reduce memory usage.

This bug was reported by:
- On the selinux bug tracker.
   BUG: kernel softlockup due to too many SIDs/contexts #37
   https://github.com/SELinuxProject/selinux-kernel/issues/37
- Jovana Knezevic on Android's bugtracker.
   Bug: 140252993
   "During multi-user performance testing, we create and remove users
   many times. selinux_android_restorecon_pkgdir goes from 1ms to over
   20ms after about 200 user creations and removals. Accumulated over
   ~280 packages, that adds a significant time to user creation,
   making perf benchmarks unreliable."

* Hashtable lookup is only O(1) when n < the number of buckets.

Changes in V2:
-The hashtable uses sidtab_entry_leaf objects directly so these
objects are shared between the sid_to_context lookup tree and the
context_to_sid hashtable. This simplifies memory allocation and
was suggested by Ondrej Mosnacek.
-The new sidtab hash stats file in selinuxfs has been moved out of
the avc dir and into a new "ss" dir.

V3:
-Add lock nesting notation.

V4:
-Moved to *_rcu variants of the various hashtable functions
as suggested by Ondrej Mosnacek and Will Deacon.
-Naming/spelling fixups.

This still triggers a splat for me in running Ondrej's reproducer (also ran selinux-testsuite first, not sure if that mattered):

SELinux:  Converting 3784 SID table entries...
kernel: ------------[ cut here ]------------
syscall 1 left IRQs disabled
WARNING: CPU: 1 PID: 8313 at arch/x86/entry/common.c:261 syscall_return_slowpathModules linked in: crypto_user scsi_transport_iscsi xt_multiport bluetooth ecdh_kernel: crc32_pclmul snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel sCPU: 1 PID: 8313 Comm: runcon Not tainted 5.4.0-rc1+ #57
Hardware name: Dell Inc. OptiPlex 7050/0NW6H5, BIOS 1.8.3 03/23/2018
RIP: 0010:syscall_return_slowpath+0x1e1/0x4a0
Code: ff ff e9 60 ff ff ff e8 2d ad 05 00 e9 2a ff ff ff 48 8d 7d 78 e8 cf cf 50RSP: 0018:ffff8885a6adff28 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffffed10b4d5bfd7
RBP: ffff8885a6adff58 R08: ffffffffaa219f90 R09: fffffbfff5883615
R10: fffffbfff5883614 R11: ffffffffac41b0a3 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f3e32c67080(0000) GS:ffff888822a40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3e32f145d0 CR3: 00000005a7476005 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
kernel:  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f3e32e11467
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8bRSP: 002b:00007ffec6482bc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: 000000000000002f RBX: 00007ffec6483d88 RCX: 00007f3e32e11467
RDX: 000000000000002f RSI: 000055b75e5a5600 RDI: 0000000000000003
RBP: 000055b75e5a5600 R08: 00000000ffffffff R09: 00007ffec6482a60
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffec6483c4c R14: 0000000000000000 R15: 0000000000000000
irq event stamp: 7292
hardirqs last enabled at (7291): [<ffffffffab26d9bb>] _raw_write_unlock_irqresthardirqs last disabled at (7292): [<ffffffffab26db61>] _raw_spin_lock_irqsave+0xsoftirqs last enabled at (7036): [<ffffffffab600494>] __do_softirq+0x494/0x665
softirqs last disabled at (7025): [<ffffffffaa168b8f>] irq_exit+0x13f/0x160
kernel: ---[ end trace 1557497a64e0b001 ]---
kernel: ------------[ cut here ]------------

Some comments below. Recommend running it by the RCU wizards e.g. Paul McKenney too given the subtleties of getting RCU right.

<snip>

diff --git a/security/selinux/ss/sidtab.c b/security/selinux/ss/sidtab.c
index 7d49994e8d5f..c052e620246c 100644
--- a/security/selinux/ss/sidtab.c
+++ b/security/selinux/ss/sidtab.c
@@ -17,26 +17,42 @@
<snip>
+static u32 context_to_sid(struct sidtab *s, struct context *context)
+{
+	struct sidtab_entry_leaf *entry;
+
+	rcu_read_lock();
+	hash_for_each_possible_rcu(s->context_to_sid, entry, list,
+				   context->hash) {
+		if (context_cmp(&entry->context, context)) {
+			rcu_read_unlock();
+			return entry->sid;

This looks unsafe to me; I would have assumed you need to at least save entry->sid to a temporary before rcu_read_unlock()?

@@ -305,29 +284,36 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
  		rc = -ENOMEM;
  		dst_convert = sidtab_do_lookup(convert->target, count, 1);
  		if (!dst_convert) {
-			context_destroy(dst);
+			context_destroy(&dst->context);
  			goto out_unlock;
  		}
- rc = convert->func(context, dst_convert, convert->args);
+		rc = convert->func(context, &dst_convert->context,
+				convert->args);
  		if (rc) {
-			context_destroy(dst);
+			context_destroy(&dst->context);
  			goto out_unlock;
  		}
+		dst_convert->sid = index_to_sid(count);
+		convert->target->count = count + 1;
/* at this point we know the insert won't fail */
-		convert->target->count = count + 1;
+		spin_lock_irqsave_nested(&convert->target->lock, flags,
+				SINGLE_DEPTH_NESTING);
+		hash_add_rcu(convert->target->context_to_sid,
+				&dst_convert->list, dst_convert->context.hash);
+		spin_unlock_irqrestore(&convert->target->lock, flags);

Still having a hard time understanding why we need to lock the target. The target here is always the newsidtab allocated by security_load_policy(), not yet exposed to any other threads in the system?

  	}
if (context->len)
  		pr_info("SELinux:  Context %s is not valid (left unmapped).\n",
  			context->str);
- sidtab_rcache_push(s, count);
-	*index = count;
+	*sid = index_to_sid(count);
- /* write entries before writing new count */
+	/* Write entries before updating count. */
  	smp_store_release(&s->count, count + 1);
+	hash_add_rcu(s->context_to_sid, &dst->list, dst->context.hash);

Exactly what guarantee do we think we are getting from placing the hash_add_rcu() after the smp_store_release() on s->count, and is that guarantee well-founded?

@@ -474,7 +463,7 @@ static void sidtab_destroy_tree(union sidtab_entry_inner entry, u32 level)
for (i = 0; i < SIDTAB_LEAF_ENTRIES; i++)
  			context_destroy(&node->entries[i].context);
-		kfree(node);
+		kfree_rcu(node, rcu);

Is it safe to do this without having deleted it from the hash and without having taken the spinlock?

  	}
  }

diff --git a/security/selinux/ss/sidtab.h b/security/selinux/ss/sidtab.h
index 1f4763141aa1..f1f505df97ba 100644
--- a/security/selinux/ss/sidtab.h
+++ b/security/selinux/ss/sidtab.h
@@ -83,11 +88,11 @@ struct sidtab {
  	struct sidtab_convert_params *convert;
  	spinlock_t lock;
- /* reverse lookup cache - access atomically via {READ|WRITE}_ONCE() */
-	u32 rcache[SIDTAB_RCACHE_SIZE];
-
  	/* index == SID - 1 (no entry for SECSID_NULL) */
  	struct sidtab_isid_entry isids[SECINITSID_NUM];
+
+	/* Hash table for fast reverse context-to-sid lookups. */
+	DECLARE_HASHTABLE(context_to_sid, SIDTAB_HASH_BITS);

Should this and/or any of the associated RCU-protected structures be marked with __rcu for sparse checking?

  };
int sidtab_init(struct sidtab *s);





[Index of Archives]     [Selinux Refpolicy]     [Linux SGX]     [Fedora Users]     [Fedora Desktop]     [Yosemite Photos]     [Yosemite Camping]     [Yosemite Campsites]     [KDE Users]     [Gnome Users]

  Powered by Linux