On 9/25/2024 4:37 PM, Bert Karwatzki wrote:
I managed to get the complete lockdep output via netconsole: T1;systemd-shutdown[1]: All filesystems unmounted. T1;systemd-shutdown[1]: Deactivating swaps. T1;systemd-shutdown[1]: All swaps deactivated. T1;systemd-shutdown[1]: Detaching loop devices. T1;systemd-shutdown[1]: All loop devices detached. T1;systemd-shutdown[1]: Stopping MD devices. T1;systemd-shutdown[1]: All MD devices stopped. T1;systemd-shutdown[1]: Detaching DM devices. T1;systemd-shutdown[1]: All DM devices detached. T1;systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached. T1;systemd-shutdown[1]: Syncing filesystems and block devices. T1;systemd-shutdown[1]: Rebooting. T3113;psmouse serio1: Failed to disable mouse on isa0060/serio1#012 SUBSYSTEM=serio#012 DEVICE=+serio:serio1 Here I was curious if the failed the psmouse message is related to the deadlock. I checked the locks and I had similar messages on an unaffected kernel (commit 6ec41c442e55) and I had a deadlock in linux-next-20240920 without this message.
Thanks for the info. This definitely appears to be the issue with asynchronous shutdown, which shouldn't happen anymore now that Greg has reverted the patches. I'm looking at this now. The async shutdown makes each device wait on children and consumers to shutdown before shutting down, but it depends on the devices_kset list having those in the correct order. The "fix async shutdown hang" patch fixed a case where suppliers could end up later in this list than their consumers, causing a circular dependence (and a hang that looks like what you are seeing). After that, Andrey Skvortsov reported seeing a hang, where it appears that a parent device is later in the devices_kset list than a child device, which I didn't realize could happen... I know how to fix that, but I'm looking at the code more carefully now to try to understand exactly how that could happen before I resubmit a new async shutdown patch.