On Fri, Dec 13 2024 at 19:48, Ming Lei wrote: > On Fri, Dec 13, 2024 at 12:31:24PM +0100, Thomas Gleixner wrote: >> I'd rather say, that's a kexec problem. On the same instance a loop test >> of suspend to ram with pm_test=core just works fine. That's equivalent >> to the kexec scenario. It goes down to syscore_suspend() and skips the >> actual suspend low level magic. It then resumes with syscore_resume() >> and brings the machine back up. >> >> That runs for 2 hours now, while the kexec muck dies within 2 >> minutes.... >> >> And if you look at the difference of these implementations, you might >> notice that kexec just implemented some rudimentary version of the >> actual suspend logic. Based on let's hope it works that way. >> >> This is just insane and should be rewritten to actually reuse the suspend >> mechanism, which is way better tested than this kexec jump muck. > > But kexec is supposed to align with reboot/shutdown, instead of suspend, > and it is calling ->shutdown() for notifying driver & device. That's only true for the case where the new kernel takes over. In the case KEXEC_JUMP=n and kexec_image->preserve_context == true, then it is supposed to align with suspend/resume and if you look at the code then it actually mimics suspend/resume in the most dilettanteish way. It's a patently bad idea to clobber the kernel with kexec jump "fixes" instead of using the well tested and established suspend/resume machinery. All it takes is to: 1) disable the wakeup logic 2) provide a mechanism to invoke machine_kexec() instead of the actual suspend mechanism. No? Thanks tglx