Qian Cai <quic_qiancai@xxxxxxxxxxx> writes: > On Fri, May 06, 2022 at 09:11:36AM -0500, Eric W. Biederman wrote: >> >> In commit 40966e316f86 ("kthread: Ensure struct kthread is present for >> all kthreads") caused init and the user mode helper threads that call >> kernel_execve to have struct kthread allocated for them. >> >> I believe my first patch in this series is enough to fix the bug >> and is simple enough and obvious enough to be backportable. >> >> The rest of the changes pass struct kernel_clone_args to clean things >> up and cause the code to make sense. >> >> There is one rough spot in this change. In the init process before the >> user space init process is exec'd there is a lot going on. I have found >> when async_schedule_domain is low on memory or has more than 32K callers >> executing do_populate_rootfs will now run in a user space thread making >> flush_delayed_fput meaningless, and __fput_sync is unusable. I solved >> this as I did in usermode_driver.c with an added explicit task_work_run. >> I point this out as I have seen some talk about making flushing file >> handles more explicit. > > Reverting the last 3 commits of the series fixed a boot crash. > > 1b2552cbdbe0 fork: Stop allowing kthreads to call execve > 753550eb0ce1 fork: Explicitly set PF_KTHREAD > 68d85f0a33b0 init: Deal with the init process being a user mode process Hmm. It looks like I missed a little detail. task_tick_fair task_tick_numa task_scan_start task_scan_min task_nr_scan_windows p->mm If I read this code right task_tick_numa makes the assumption that only tasks with PF_KTHREAD set don't have an mm. This should fix the failure. For init we could possibly populate .mm and not just .active_mm. For user mode helpers cloned from kernel threads I don't think that is a realistic option. So I think this is going to be the proper fix. I believe this only happens when numa rebalancing happens at an unfortunate moment. Qian Cai can you test this? diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d4bd299d67ab..db6f0df9d43e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2915,7 +2915,7 @@ static void task_tick_numa(struct rq *rq, struct task_struct *curr) /* * We don't care about NUMA placement if we don't have memory. */ - if ((curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work) + if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work) return; /* Eric