On Fri, Dec 13, 2024 at 12:31:24PM +0100, Thomas Gleixner wrote: > On Fri, Dec 13 2024 at 19:09, Ming Lei wrote: > > On Fri, Dec 13, 2024 at 11:42:59AM +0100, Thomas Gleixner wrote: > >> That's the control thread on CPU0. The hotplug thread on CPU1 is stuck > >> here: > >> > >> task:cpuhp/1 state:D stack:0 pid:24 tgid:24 ppid:2 flags:0x00004000 > >> Call Trace: > >> <TASK> > >> __schedule+0x51f/0x1a80 > >> schedule+0x3a/0x140 > >> schedule_timeout+0x90/0x110 > >> msleep+0x2b/0x40 > >> blk_mq_hctx_notify_offline+0x160/0x3a0 > >> cpuhp_invoke_callback+0x2a8/0x6c0 > >> cpuhp_thread_fun+0x1ed/0x270 > >> smpboot_thread_fn+0xda/0x1d0 > >> > >> So something with those blk_mq fixes went sideways. > > > > The cpuhp callback is just waiting for inflight IOs to be completed when > > the irq is still live. > > > > It looks same with the following report: > > > > https://lore.kernel.org/linux-scsi/F991D40F7D096653+20241203211857.0291ab1b@john-PC/ > > > > Still triggered in case of kexec & qemu, which should be one qemu > > problem. > > I'd rather say, that's a kexec problem. On the same instance a loop test > of suspend to ram with pm_test=core just works fine. That's equivalent > to the kexec scenario. It goes down to syscore_suspend() and skips the > actual suspend low level magic. It then resumes with syscore_resume() > and brings the machine back up. > > That runs for 2 hours now, while the kexec muck dies within 2 > minutes.... > > And if you look at the difference of these implementations, you might > notice that kexec just implemented some rudimentary version of the > actual suspend logic. Based on let's hope it works that way. > > This is just insane and should be rewritten to actually reuse the suspend > mechanism, which is way better tested than this kexec jump muck. But kexec is supposed to align with reboot/shutdown, instead of suspend, and it is calling ->shutdown() for notifying driver & device. Thanks, Ming