On 16 December 2024 13:20:56 GMT, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: >David reported a warning observed while loop testing kexec jump: > > Interrupts enabled after irqrouter_resume+0x0/0x50 > WARNING: CPU: 0 PID: 560 at drivers/base/syscore.c:103 syscore_resume+0x18a/0x220 > kernel_kexec+0xf6/0x180 > __do_sys_reboot+0x206/0x250 > do_syscall_64+0x95/0x180 > >The corresponding interrupt flag trace: > > hardirqs last enabled at (15573): [<ffffffffa8281b8e>] __up_console_sem+0x7e/0x90 > hardirqs last disabled at (15580): [<ffffffffa8281b73>] __up_console_sem+0x63/0x90 > >That means __up_console_sem() was invoked with interrupts enabled. Further >instrumentation revealed that in the interrupt disabled section of kexec >jump one of the syscore_suspend() callbacks woke up a task, which set the >NEED_RESCHED flag. A later callback in the resume path invoked >cond_resched() which in turn led to the invocation of the scheduler: > > __cond_resched+0x21/0x60 > down_timeout+0x18/0x60 > acpi_os_wait_semaphore+0x4c/0x80 > acpi_ut_acquire_mutex+0x3d/0x100 > acpi_ns_get_node+0x27/0x60 > acpi_ns_evaluate+0x1cb/0x2d0 > acpi_rs_set_srs_method_data+0x156/0x190 > acpi_pci_link_set+0x11c/0x290 > irqrouter_resume+0x54/0x60 > syscore_resume+0x6a/0x200 > kernel_kexec+0x145/0x1c0 > __do_sys_reboot+0xeb/0x240 > do_syscall_64+0x95/0x180 > >This is a long standing problem, which probably got more visible with >the recent printk changes. Something does a task wakeup and the >scheduler sets the NEED_RESCHED flag. cond_resched() sees it set and >invokes schedule() from a completely bogus context. The scheduler >enables interrupts after context switching, which causes the above >warning at the end. > >Quite some of the code paths in syscore_suspend()/resume() can result in >triggering a wakeup with the exactly same consequences. They might not >have done so yet, but as they share a lot of code with normal operations >it's just a question of time. > >The problem only affects the PREEMPT_NONE and PREEMPT_VOLUNTARY scheduling >models. Full preemption is not affected as cond_resched() is disabled and >the preemption check preemptible() takes the interrupt disabled flag into >account. > >Cure the problem by adding a corresponding check into cond_resched(). > >Reported-by: David Woodhouse <dwmw2@xxxxxxxxxxxxx> >Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >Tested-by: David Woodhouse <dwmw2@xxxxxxxxxxxxx> >Cc: stable@xxxxxxxxxxxxxxx >Closes: https://lore.kernel.org/all/7717fe2ac0ce5f0a2c43fdab8b11f4483d54a2a4.camel@xxxxxxxxxxxxx >--- > kernel/sched/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >--- a/kernel/sched/core.c >+++ b/kernel/sched/core.c >@@ -7276,7 +7276,7 @@ void rt_mutex_setprio(struct task_struct > #if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) > int __sched __cond_resched(void) > { >- if (should_resched(0)) { >+ if (should_resched(0) && !irqs_disabled()) { > preempt_schedule_common(); > return 1; > } Thank you. Slight preference for dwmw@xxxxxxxxxxxx as the Reported-by and Tested-by addresses if it's not too late. I'm assuming you will handle this and don't want me to round it up with the kexec bits.