I'm seeing a race during policy load while the "regular" sidtab conversion is happening and a live conversion starts to take place in sidtab_context_to_sid(). We have an initial policy that's loaded by systemd ~0.6s into boot and then another policy gets loaded ~2-3s into boot. That second policy load is what hits the race condition situation because the sidtab is only partially populated and there's a decent amount of filesystem operations happening, at the same time, which are triggering live conversions. [ 3.091910] Unable to handle kernel paging request at virtual address 001303e1aa140408 [ 3.100083] Mem abort info: [ 3.102963] ESR = 0x96000004 [ 3.102965] EC = 0x25: DABT (current EL), IL = 32 bits [ 3.102967] SET = 0, FnV = 0 [ 3.102968] EA = 0, S1PTW = 0 [ 3.102969] Data abort info: [ 3.102970] ISV = 0, ISS = 0x00000004 [ 3.102971] CM = 0, WnR = 0 [ 3.102973] [001303e1aa140408] address between user and kernel address ranges [ 3.102977] Internal error: Oops: 96000004 [#1] SMP [ 3.102981] Modules linked in: [ 3.111250] bnxt_en pcie_iproc_platform pcie_iproc diagbe(O) [ 3.111259] CPU: 0 PID: 529 Comm: (tservice) Tainted: G O 5.10.17.1 #1 [ 3.119881] Hardware name: redacted (DT) [ 3.119884] pstate: 20400085 (nzCv daIf +PAN -UAO -TCO BTYPE=--) [ 3.119898] pc : sidtab_do_lookup (/usr/src/kernel/security/selinux/ss/sidtab.c:202) [ 3.119902] lr : sidtab_context_to_sid (/usr/src/kernel/security/selinux/ss/sidtab.c:312) [ 3.126105] sp : ffff800011ceb810 [ 3.126106] x29: ffff800011ceb810 x28: 0000000000000000 [ 3.126108] x27: 0000000000000005 x26: ffffda109f3f2000 [ 3.126110] x25: 00000000ffffffff x24: 0000000000000000 [ 3.126113] x23: 0000000000000001 [ 3.133124] x22: 0000000000000005 [ 3.133125] x21: aa1303e1aa140408 x20: 0000000000000001 [ 3.133127] x19: 00000000000000cc x18: 0000000000000003 [ 3.133128] x17: 000000000000003e x16: 000000000000003f [ 3.145519] x15: 0000000000000039 x14: 000000000000002e [ 3.145521] x13: 0000000058294db1 x12: 00000000158294db [ 3.145523] x11: 000000007f0b3af2 x10: 0000000000004e00 [ 3.145525] x9 : 00000000000000cd x8 : 0000000000000005 [ 3.281289] x7 : feff735e62647764 x6 : 00000000000080cc [ 3.286769] x5 : 0000000000000005 x4 : ffff3f47c5b20000 [ 3.292249] x3 : ffff800011ceb900 x2 : 0000000000000001 [ 3.297729] x1 : 00000000000000cc x0 : aa1303e1aa1403e0 [ 3.303210] Call trace: [ 3.305733] sidtab_do_lookup (/usr/src/kernel/security/selinux/ss/sidtab.c:202) [ 3.309867] sidtab_context_to_sid (/usr/src/kernel/security/selinux/ss/sidtab.c:312) [ 3.314451] security_context_to_sid_core (/usr/src/kernel/security/selinux/ss/services.c:1557) [ 3.319661] security_context_to_sid_default (/usr/src/kernel/security/selinux/ss/services.c:1616) [ 3.324961] inode_doinit_use_xattr (/usr/src/kernel/security/selinux/hooks.c:1366) [ 3.329634] inode_doinit_with_dentry (/usr/src/kernel/security/selinux/hooks.c:1457) [ 3.334486] selinux_d_instantiate (/usr/src/kernel/security/selinux/hooks.c:6278) [ 3.338889] security_d_instantiate (/usr/src/kernel/security/security.c:2004) [ 3.343385] d_splice_alias (/usr/src/kernel/fs/dcache.c:3030) [ 3.347251] squashfs_lookup (/usr/src/kernel/fs/squashfs/namei.c:220) [ 3.385561] el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428) [ 3.389517] el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671) [ 3.392939] Code: 51002718 340001d7 1ad82768 8b284c15 (f94002a0) All code ======== 0: 18 27 sbb %ah,(%rdi) 2: 00 51 d7 add %dl,-0x29(%rcx) 5: 01 00 add %eax,(%rax) 7: 34 68 xor $0x68,%al 9: 27 (bad) a: d8 1a fcomps (%rdx) c:* 15 4c 28 8b a0 adc $0xa08b284c,%eax <-- trapping instruction 11: 02 40 f9 add -0x7(%rax),%al Code starting with the faulting instruction =========================================== 0: a0 .byte 0xa0 1: 02 40 f9 add -0x7(%rax),%al [ 3.399230] ---[ end trace cc1840b3ff2c7506 ]--- The corresponding source from sidtab.c: 179 static struct sidtab_entry *sidtab_do_lookup(struct sidtab *s, u32 index, 180 int alloc) 181 { ... 193 /* lookup inside the subtree */ 194 entry = &s->roots[level]; 195 while (level != 0) { 196 capacity_shift -= SIDTAB_INNER_SHIFT; 197 --level; 198 199 entry = &entry->ptr_inner->entries[leaf_index >> capacity_shift]; 200 leaf_index &= ((u32)1 << capacity_shift) - 1; 201 202 if (!entry->ptr_inner) { 203 if (alloc) 204 entry->ptr_inner = kzalloc(SIDTAB_NODE_ALLOC_SIZE, 205 GFP_ATOMIC); 206 if (!entry->ptr_inner) 207 return NULL; 208 } 209 } 210 if (!entry->ptr_leaf) { 211 if (alloc) 212 entry->ptr_leaf = kzalloc(SIDTAB_NODE_ALLOC_SIZE, 213 GFP_ATOMIC); 214 if (!entry->ptr_leaf) 215 return NULL; 216 } 217 return &entry->ptr_leaf->entries[index % SIDTAB_LEAF_ENTRIES]; 218 } ... 263 int sidtab_context_to_sid(struct sidtab *s, struct context *context, 264 u32 *sid) 265 { ... 305 /* 306 * if we are building a new sidtab, we need to convert the context 307 * and insert it there as well 308 */ 309 if (convert) { 310 rc = -ENOMEM; 311 dst_convert = sidtab_do_lookup(convert->target, count, 1); 312 if (!dst_convert) { 313 context_destroy(&dst->context); 314 goto out_unlock; 315 } ... What I'm having trouble understanding is how the above call to sidtab_do_lookup(), on the target sidtab that's undergoing a conversion in sidtab_convert(), can be expected to work. sidtab_convert_tree() is allocating and initializing ptr_inner sidtab nodes at the same time sidtab_do_lookup() is trying to use them with no locking being performed on the target sidtab. Ondrej specifically mentions, in commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve performance"), that there's no need to freeze the sidtab during policy reloads so I know that there's thought given to these code paths running in parallel. Can someone more knowledgeable on how the sidtab locking is expected to work suggest a fix for this crash? Tyler