* Paul E. McKenney <paulmck@xxxxxxxxxx> [230913 07:28]: > On Wed, Sep 13, 2023 at 01:01:39PM +0200, Peter Zijlstra wrote: > > On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote: > > > Initial booting is setting the task flag to idle (PF_IDLE) by the call > > > path sched_init() -> init_idle(). Having the task idle and calling > > > call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be > > > set. Subsequent calls to any cond_resched() will enable IRQs, > > > potentially earlier than the IRQ setup has completed. Recent changes > > > have caused just this scenario and IRQs have been enabled early. > > > > > > This causes a warning later in start_kernel() as interrupts are enabled > > > before they are fully set up. > > > > > > Fix this issue by clearing the PF_IDLE flag on return from sched_init() > > > and restore the flag in rest_init(). Although the boot task was marked > > > as idle since (at least) d80e4fda576d, I am not sure that it is wrong to > > > do so. The forced context-switch on idle task was introduced in the > > > tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee. > > > > > > Link: https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/ > > > Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@xxxxxxxxxxxxxx/ > > > Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task") > > > Cc: stable@xxxxxxxxxxxxxxx > > > Cc: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> > > > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > > > Cc: Christophe Leroy <christophe.leroy@xxxxxxxxxx> > > > Cc: Andreas Schwab <schwab@xxxxxxxxxxxxxx> > > > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > > Cc: Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx> > > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > > > Cc: Juri Lelli <juri.lelli@xxxxxxxxxx> > > > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx> > > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > Cc: "Mike Rapoport (IBM)" <rppt@xxxxxxxxxx> > > > Cc: Vlastimil Babka <vbabka@xxxxxxx> > > > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> > > > --- > > > init/main.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/init/main.c b/init/main.c > > > index ad920fac325c..f74772acf612 100644 > > > --- a/init/main.c > > > +++ b/init/main.c > > > @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void) > > > */ > > > rcu_read_lock(); > > > tsk = find_task_by_pid_ns(pid, &init_pid_ns); > > > - tsk->flags |= PF_NO_SETAFFINITY; > > > + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE; > > > set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id())); > > > rcu_read_unlock(); > > > > > > @@ -938,6 +938,8 @@ void start_kernel(void) > > > * time - but meanwhile we still have a functioning scheduler. > > > */ > > > sched_init(); > > > + /* Avoid early context switch, rest_init() restores PF_IDLE */ > > > + current->flags &= ~PF_IDLE; > > > > > > if (WARN(!irqs_disabled(), > > > "Interrupts were enabled *very* early, fixing it\n")) > > > > Hurmph... so since this is about IRQs, would it not make sense to have > > the | PF_IDLE near 'early_boot_irqs_disabled = false' ? I was thinking that this isn't an idle thread until the end of boot, so leave it set as not idle until the end. > > > > Or, alternatively, make the tinyrcu thing check that variable? > > We could do that, but do we really the decidedly non-idle early boot > sequence designated as idle? call_rcu() tiny is called more than this code, so unless there is another reason we want to check the IRQ status it seemed best to change the boot task flag. I mean, I think the is_idle_task() check is valid in every other context, right? > > What surprises me is that this is just now showing up. The ingredients > for this one have been in place for almost 10 years. > Am I just that lucky? Cheers, Liam