Hi all,
We are seeing a deadlock between reset_lock and pci_slot_mutex when one
injects an error with Intel PEI error injection card. It was initially
reported with some older kernels but it was also reproduced on 6.11-rc6.
Apparently, it requires FW first more of AER handling being set.
Not tainted 6.11.0-rc6-orig+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:2 state:D stack:0 pid:1003 tgid:1003 ppid:2 flags:0x00000008
Workqueue: events aer_recover_work_func
Call trace:
__switch_to+0xc4/0xe8
__schedule+0x280/0x748
schedule+0x3c/0xe0
schedule_preempt_disabled+0x2c/0x50
rwsem_down_write_slowpath+0x1ec/0x6f0
down_write+0xac/0xb8
pciehp_reset_slot+0x60/0x178 <-- ctrl->reset_lock
pci_reset_hotplug_slot+0x54/0x90
pci_slot_reset+0x138/0x1a8
pci_bus_error_reset+0x110/0x158 <-- pci_slot_mutex
aer_root_reset+0xbc/0x298
pcie_do_recovery+0x2a0/0x3b8
aer_recover_work_func+0x144/0x150
process_one_work+0x184/0x420
worker_thread+0x250/0x360
kthread+0xfc/0x110
ret_from_fork+0x10/0x20
INFO: task irq/78-pciehp:1497 blocked for more than 122 seconds.
Not tainted 6.11.0-rc6-orig+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:irq/78-pciehp state:D stack:0 pid:1497 tgid:1497 ppid:2 flags:0x00000008
Call trace:
__switch_to+0xc4/0xe8
__schedule+0x280/0x748
schedule+0x3c/0xe0
schedule_preempt_disabled+0x2c/0x50
__mutex_lock.constprop.0+0x28c/0x960
__mutex_lock_slowpath+0x1c/0x30
mutex_lock+0x6c/0x88
pci_dev_assign_slot+0x2c/0x88 <-- pci_slot_mutex
pci_setup_device+0xfc/0x6f0
pci_scan_single_device+0xd0/0x120
pci_scan_slot+0x6c/0x200
pciehp_configure_device+0x50/0x188
pciehp_enable_slot+0x1b0/0x290
pciehp_handle_presence_or_link_change+0xfc/0x208
pciehp_ist+0x214/0x260
irq_thread_fn+0x34/0xb8
irq_thread+0x160/0x250 <-- ctrl->reset_lock
kthread+0xfc/0x110
ret_from_fork+0x10/0x20
I noticed Ian May reported two deadlocks a while ago [1]. The first issue
got fixed but I'm wondering if the other one was patched and we're simply
seeing a new, yet a similar one?
[1] https://lore.kernel.org/linux-pci/20200615143250.438252-1-ian.may@xxxxxxxxxxxxx/
Cheers, Ilkka