On Fri, Dec 30, 2022 at 08:46:25AM -0500, Mathieu Desnoyers wrote: > void sched_mm_cid_after_execve(struct task_struct *t) > { > struct mm_struct *mm = t->mm; > unsigned long flags; > > WARN_ON_ONCE((t->flags & PF_KTHREAD) || !t->mm); Yeah, it is that check and it reproduces here trivially in my guest so much so so that I can't even boot current tip/master in it due to the constant flood from it. Also, there's a null ptr deref there: [ 1.694051] Initialise system trusted keyrings [ 1.694915] ------------[ cut here ]------------ [ 1.695689] BUG: kernel NULL pointer dereference, address: 000000000000005c [ 1.695714] #PF: supervisor write access in kernel mode [ 1.695721] #PF: error_code(0x0002) - not-present page [ 1.695728] PGD 0 P4D 0 [ 1.695739] Oops: 0002 [#1] PREEMPT SMP [ 1.695747] CPU: 0 PID: 126 Comm: kworker/u32:1 Not tainted 6.2.0-rc2+ #2 [ 1.695754] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 1.695760] RIP: 0010:_raw_spin_lock+0x17/0x30 [ 1.702127] WARNING: CPU: 13 PID: 115 at kernel/sched/core.c:11346 sched_mm_cid_after_execve+0xd5/0xf0 [ 1.699309] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 c8 ea 64 7e 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 00 00 00 0f 1f 80 00 [ 1.702857] Modules linked in: [ 1.699309] RSP: 0018:ffffc900004afe78 EFLAGS: 00010046 [ 1.703670] [ 1.699309] [ 1.704665] CPU: 13 PID: 115 Comm: kworker/u32:0 Not tainted 6.2.0-rc2+ #2 [ 1.699309] RAX: 0000000000000000 RBX: ffff88800d323d00 RCX: 0000000000000000 [ 1.699309] RDX: 0000000000000001 RSI: ffff88800d323d00 RDI: 000000000000005c [ 1.699309] RBP: 000000000000005c R08: 0000000000000064 R09: ffffc900004afb30 [ 1.699309] R10: 0000000000000000 R11: fffffffffffffffe R12: 0000000000000246 [ 1.699309] R13: 0000000000000000 R14: 00000000fffffffe R15: ffff88800d323d00 [ 1.699309] FS: 0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000 [ 1.699309] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.699309] CR2: 000000000000005c CR3: 000000000220a000 CR4: 00000000003506f0 [ 1.699309] Call Trace: [ 1.699309] <TASK> [ 1.699309] sched_mm_cid_after_execve+0x52/0xf0 [ 1.706650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 1.699309] bprm_execve+0x323/0x600 [ 1.707390] RIP: 0010:sched_mm_cid_after_execve+0xd5/0xf0 [ 1.699309] kernel_execve+0x15f/0x1c0 [ 1.707967] Code: 00 00 74 04 f0 80 0b 02 48 8b 1c 24 48 8b 6c 24 08 4c 8b 64 24 10 4c 8b 6c 24 18 4c 8b 74 24 20 48 83 c4 28 c3 cc cc cc cc 90 <0f> 0b 90 e9 65 ff ff ff 41 be ff ff ff ff eb 9d 66 66 2e 0f 1f 84 [ 1.699309] call_usermodehelper_exec_async+0xd1/0x190 [ 1.708882] RSP: 0018:ffffc90000457e80 EFLAGS: 00010246 [ 1.699309] ? __pfx_call_usermodehelper_exec_async+0x10/0x10 [ 1.709839] [ 1.699309] ret_from_fork+0x2c/0x50 [ 1.710739] RAX: fffffffffffffffe RBX: ffff88800cad8f40 RCX: 0000000000000000 [ 1.699309] </TASK> [ 1.714247] RDX: ffffc90000457dc8 RSI: ffff88800cad8f40 RDI: ffff88800cad8f40 [ 1.699309] Modules linked in: [ 1.715270] RBP: ffff88800dd35400 R08: 0000000000000064 R09: ffffc90000457b30 [ 1.699309] CR2: 000000000000005c [ 1.699309] ---[ end trace 0000000000000000 ]--- ... flood of the above... > is too strict. AFAIU the usermodehelper thread is a kernel thread, which > happens to have a non-NULL mm after execve. We want to allow usermodehelper > threads to use rseq, so I think the appropriate approach here would be to > just warn if !t->mm: > > WARN_ON_ONCE(!t->mm); You need at least this to avoid the null ptr deref too: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 048ec2417990..5c920c94a6b2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -11340,10 +11340,13 @@ void sched_mm_cid_before_execve(struct task_struct *t) void sched_mm_cid_after_execve(struct task_struct *t) { - struct mm_struct *mm = t->mm; + struct mm_struct *mm; unsigned long flags; - WARN_ON_ONCE((t->flags & PF_KTHREAD) || !t->mm); + if (WARN_ON_ONCE(!t->mm)) + return; + + mm = t->mm; local_irq_save(flags); t->mm_cid = mm_cid_get(mm); --- which gives the below. I'm not sure though how the rules are about those kworker threads and them having a ->mm... [ 1.734104] ------------[ cut here ]------------ [ 1.734144] Initialise system trusted keyrings [ 1.734553] WARNING: CPU: 9 PID: 109 at kernel/sched/core.c:11346 sched_mm_cid_after_execve+0xcb/0xe0 [ 1.752756] workingset: timestamp_bits=61 max_order=19 bucket_order=0 [ 1.754187] Modules linked in: [ 1.768160] 9p: Installing v9fs 9p2000 file system support [ 1.768640] [ 1.768876] Key type asymmetric registered [ 1.769048] CPU: 9 PID: 109 Comm: kworker/u32:1 Not tainted 6.2.0-rc2+ #9 [ 1.769207] Asymmetric key parser 'x509' registered [ 1.769397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 1.769651] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249) [ 1.769833] RIP: 0010:sched_mm_cid_after_execve+0xcb/0xe0 [ 1.770162] io scheduler mq-deadline registered [ 1.770462] Code: 00 00 74 04 f0 80 0b 02 48 8b 1c 24 48 8b 6c 24 08 4c 8b 64 24 10 4c 8b 6c 24 18 4c 8b 74 24 20 48 83 c4 28 c3 cc cc cc cc 90 <0f> 0b 90 eb d9 41 be ff ff ff ff eb a0 0f 1f 84 00 00 00 00 00 90 [ 1.810713] RSP: 0018:ffffc90000427e80 EFLAGS: 00010246 [ 1.823527] RAX: fffffffffffffffe RBX: ffff88800cb88000 RCX: 0000000000000000 [ 1.824425] RDX: ffffc90000427dc8 RSI: ffff88800cb88000 RDI: ffff88800cb88000 [ 1.825266] acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed [ 1.825564] RBP: ffff88800d2d8200 R08: 0000000000000064 R09: ffffc90000427b30 [ 1.825914] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 [ 1.826068] R10: 0000000000000000 R11: fffffffffffffffe R12: fffffffffffffffe [ 1.839784] ACPI: button: Power Button [PWRF] [ 1.840327] R13: 0000000000000000 R14: 00000000fffffffe R15: ffff88800cb88000 [ 1.855532] FS: 0000000000000000(0000) GS:ffff88807dc40000(0000) knlGS:0000000000000000 [ 1.856681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.857403] CR2: 0000000000000000 CR3: 000000000220a000 CR4: 00000000003506e0 [ 1.858264] Call Trace: [ 1.858643] <TASK> [ 1.871528] bprm_execve+0x323/0x600 [ 1.872027] kernel_execve+0x15f/0x1c0 [ 1.872505] call_usermodehelper_exec_async+0xd1/0x190 [ 1.873120] ? __pfx_call_usermodehelper_exec_async+0x10/0x10 [ 1.873800] ret_from_fork+0x2c/0x50 [ 1.874259] </TASK> [ 1.874582] ---[ end trace 0000000000000000 ]--- -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette