* Peter Zijlstra <peterz@xxxxxxxxxxxxx> [230913 12:13]: > On Wed, Sep 13, 2023 at 10:51:25AM -0400, Liam R. Howlett wrote: > > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> [230913 09:53]: > > > On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote: > > > > > > > diff --git a/init/main.c b/init/main.c > > > > index ad920fac325c..f74772acf612 100644 > > > > --- a/init/main.c > > > > +++ b/init/main.c > > > > @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void) > > > > */ > > > > rcu_read_lock(); > > > > tsk = find_task_by_pid_ns(pid, &init_pid_ns); > > > > - tsk->flags |= PF_NO_SETAFFINITY; > > > > + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE; > > > > set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id())); > > > > rcu_read_unlock(); > > > > > > > > > > Hmm, isn't that pid-1 you're setting PF_IDLE on? > > > > Yes, thanks. I think that is what Geert is hitting with my patch. > > > > debug __might_resched() in kernel/sched/core.c is failing to return in > > that first (complex) if statement. His report says pid 1 so this is > > likely the issue. > > > > > > > > The task becoming idle is 'current' at this point, see the > > > cpu_startup_entry() call below. > > > > > > Would not something like so be the right thing? > > > > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index 2299a5cfbfb9..802551e0009b 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -9269,7 +9269,7 @@ void __init init_idle(struct task_struct *idle, int cpu) > > > * PF_KTHREAD should already be set at this point; regardless, make it > > > * look like a proper per-CPU kthread. > > > */ > > > - idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY; > > > + idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY; > > > > I am concerned this will alter more than just the current task, which > > would mean more modifications later. There is a comment about it being > > called 'more than once' and 'per cpu' so I am hesitant to change the > > function itself. > > > > Although I am unsure of the call path.. fork_idle() -> init_idle() I > > guess? > > There's only 2 ways to get into do_idle(), through cpu_startup_entry() > and play_idle_precise(). The latter already frobs PF_IDLE since it is > the forced idle path, this then leaves cpu_startup_entry() which is the > regular idle path. > > All idle threads will end up calling into it, the boot CPU through the > rest_init() and the SMP cpus through arch SMP bringup. > > IOW, this ensures all idle loops will have PF_IDLE set but not the > pre-idle loop setup code these threads run. Thanks for the information. This does leave the init_idle() function in the odd state of not setting PF_IDLE, but I guess that's okay?