Hi Bart, On Tue, Aug 23, 2022 at 8:10 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > On 8/22/22 23:41, Geert Uytterhoeven wrote: > > A lock-up (magic sysrq does not work) during s2idle. > > I tried bisecting it yesterday, but failed. > > On v6.0-rc1 (and rc2) it happens ca. 25% of the time, but the closer > > I get to v5.19, the less likely it is to happen. Apparently 100 > > successful s2idle cycles was not enough to declare a kernel good... > > > > Freezing ... > > Filesystems sync: 0.001 seconds > > Freezing user space processes ... (elapsed 0.001 seconds) done. > > OOM killer disabled. > > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > sd 0:0:0:0: [sda] Synchronizing SCSI cache > > sd 0:0:0:0: [sda] Stopping disk > > > > ---> hangs here if it happens > > > > ravb e6800000.ethernet eth0: Link is Down > > sd 0:0:0:0: [sda] Starting disk > > Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached > > PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=186) > > ata1: link resume succeeded after 1 retries > > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > ata1.00: configured for UDMA/133 > > OOM killer enabled. > > Restarting tasks ... done. > > random: crng reseeded on system resumption > > PM: suspend exit > > ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off > > I'm not sure that is enough information to find the root cause. How Sorry for not making it clear I didn't expect this to be enough information. > about enabling the tp_printk boot option and to enable tracing for > suspend/resume operations, e.g. as follows? > > cd /sys/kernel/tracing && > echo 256 > /sys/kernel/tracing/buffer_size_kb && > echo nop > current_tracer && > echo > trace && > echo 1 > events/power/device_pm_callback_start/enable && > echo 1 > events/power/device_pm_callback_end/enable && > echo 1 > events/power/suspend_resume/enable && > echo 1 > tracing_on Thanks, that generates lots of output (362 KiB/cycle)! Unfortunately it also has an impact on the probability of lock-ups. Combined with 'scsi: sd: Revert "Rework asynchronous resume support"', s2idle now works almost always. I did manage to trigger the lock-up once with tracing enabled: device_pm_callback_end: gpio_rcar e6055400.gpio, err=0 device_pm_callback_start: gpio_rcar e6055800.gpio, parent: soc, noirq power domain [suspend] device_pm_callback_end: gpio_rcar e6055800.gpio, err=0 device_pm_callback_start: renesas-cpg-mssr e6150000.clock-controller, parent: soc, noirq driver [suspend] device_pm_callback_end: renesas-cpg-mssr e6150000.clock-controller, err=0 device_pm_callback_start: sh-pfc e6060000.pinctrl, parent: soc, noirq driver [suspend] device_pm_callback_end: sh-pfc e6060000.pinctrl, err=0 suspend_resume: dpm_suspend_noirq[2] end suspend_resume: machine_suspend[1] begin suspend_resume: timekeeping_freeze[5] begin ---> hang suspend_resume: timekeeping_freeze[0] end suspend_resume: machine_suspend[1] end suspend_resume: dpm_resume_noirq[16] begin device_pm_callback_start: sh-pfc e6060000.pinctrl, parent: soc, noirq driver [resume] device_pm_callback_end: sh-pfc e6060000.pinctrl, err=0 device_pm_callback_start: renesas-cpg-mssr e6150000.clock-controller, parent: soc, noirq driver [resume] device_pm_callback_end: renesas-cpg-mssr e6150000.clock-controller, err=0 device_pm_callback_start: gpio_rcar e6055800.gpio, parent: soc, noirq power domain [resume] Oops, timers... At least it's not related to SCSI ;-) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds