On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote: > Initial booting is setting the task flag to idle (PF_IDLE) by the call > path sched_init() -> init_idle(). Having the task idle and calling > call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be > set. Subsequent calls to any cond_resched() will enable IRQs, > potentially earlier than the IRQ setup has completed. Recent changes > have caused just this scenario and IRQs have been enabled early. > > This causes a warning later in start_kernel() as interrupts are enabled > before they are fully set up. > > Fix this issue by clearing the PF_IDLE flag on return from sched_init() > and restore the flag in rest_init(). Although the boot task was marked > as idle since (at least) d80e4fda576d, I am not sure that it is wrong to > do so. The forced context-switch on idle task was introduced in the > tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee. > > Link: https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/ > Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@xxxxxxxxxxxxxx/ > Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task") > Cc: stable@xxxxxxxxxxxxxxx > Cc: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > Cc: Christophe Leroy <christophe.leroy@xxxxxxxxxx> > Cc: Andreas Schwab <schwab@xxxxxxxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Juri Lelli <juri.lelli@xxxxxxxxxx> > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: "Mike Rapoport (IBM)" <rppt@xxxxxxxxxx> > Cc: Vlastimil Babka <vbabka@xxxxxxx> > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> > --- > init/main.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/init/main.c b/init/main.c > index ad920fac325c..f74772acf612 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void) > */ > rcu_read_lock(); > tsk = find_task_by_pid_ns(pid, &init_pid_ns); > - tsk->flags |= PF_NO_SETAFFINITY; > + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE; > set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id())); > rcu_read_unlock(); > > @@ -938,6 +938,8 @@ void start_kernel(void) > * time - but meanwhile we still have a functioning scheduler. > */ > sched_init(); > + /* Avoid early context switch, rest_init() restores PF_IDLE */ > + current->flags &= ~PF_IDLE; > > if (WARN(!irqs_disabled(), > "Interrupts were enabled *very* early, fixing it\n")) Hurmph... so since this is about IRQs, would it not make sense to have the | PF_IDLE near 'early_boot_irqs_disabled = false' ? Or, alternatively, make the tinyrcu thing check that variable?