Deadlock during PCIe hot remove and SPDK exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear PCI maintainers:
   I'm having a deadlock issue, somewhat similar to a previous one https://lore.kernel.org/linux-pci/CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/#t,; but my kernel (6.6.40) already included the fix f5eff55. 
   Here is my test process, I’m running kernel with 6.6.40 and SPDK v22.05:
   1. SPDK use vfio driver to takeover two nvme disks, running some io in nvme.
   2. pull out two nvme disks
   3. Try to kill -9 SPDK process.
   Then deadlock issue happened. For now I can 100% reproduce this problem. I’m not an export in PCI, but I did a brief analysis:
   irq 149 thread take pci_rescan_remove_lock mutex lock, and wait for SPDK to release vfio.
   irq 148 thread take reset_lock of ctrl A, and wait for psi_rescan_remove_lock
   SPDK process try to release vfio driver, but wait for reset_lock of ctrl A.


irq/149-pciehp stack, cat /proc/514/stack, 
[<0>] pciehp_unconfigure_device+0x48/0x160 // wait for pci_rescan_remove_lock
[<0>] pciehp_disable_slot+0x6b/0x130       // hold reset_lock of ctrl A
[<0>] pciehp_handle_presence_or_link_change+0x7d/0x4d0
[<0>] pciehp_ist+0x236/0x260
[<0>] irq_thread_fn+0x1b/0x60
[<0>] irq_thread+0xed/0x190
[<0>] kthread+0xe4/0x110
[<0>] ret_from_fork+0x2d/0x50
[<0>] ret_from_fork_asm+0x11/0x20


irq/148-pciehp stack, cat /proc/513/stack
[<0>] vfio_unregister_group_dev+0x97/0xe0 [vfio]     //wait for 
[<0>] vfio_pci_core_unregister_device+0x19/0x80 [vfio_pci_core]
[<0>] vfio_pci_remove+0x15/0x20 [vfio_pci]
[<0>] pci_device_remove+0x39/0xb0
[<0>] device_release_driver_internal+0xad/0x120
[<0>] pci_stop_bus_device+0x5d/0x80
[<0>] pci_stop_and_remove_bus_device+0xe/0x20
[<0>] pciehp_unconfigure_device+0x91/0x160   //hold pci_rescan_remove_lock, release reset_lock of ctrl B 
[<0>] pciehp_disable_slot+0x6b/0x130
[<0>] pciehp_handle_presence_or_link_change+0x7d/0x4d0
[<0>] pciehp_ist+0x236/0x260             //hold reset_lock of ctrl B 
[<0>] irq_thread_fn+0x1b/0x60
[<0>] irq_thread+0xed/0x190
[<0>] kthread+0xe4/0x110
[<0>] ret_from_fork+0x2d/0x50
[<0>] ret_from_fork_asm+0x11/0x20


SPDK stack, cat /proc/166634/task/167181/stack
[<0>] down_write_nested+0x1b7/0x1c0            //wait for reset_lock of ctrl A.
[<0>] pciehp_reset_slot+0x58/0x160
[<0>] pci_reset_hotplug_slot+0x3b/0x60
[<0>] pci_reset_bus_function+0x3b/0xb0
[<0>] __pci_reset_function_locked+0x3e/0x60
[<0>] vfio_pci_core_disable+0x3ce/0x400 [vfio_pci_core]
[<0>] vfio_pci_core_close_device+0x67/0xc0 [vfio_pci_core]
[<0>] vfio_df_close+0x79/0xd0 [vfio]
[<0>] vfio_df_group_close+0x36/0x70 [vfio]
[<0>] vfio_device_fops_release+0x20/0x40 [vfio]
[<0>] __fput+0xec/0x290
[<0>] task_work_run+0x61/0x90
[<0>] do_exit+0x39c/0xc40
[<0>] do_group_exit+0x33/0xa0
[<0>] get_signal+0xd84/0xd90
[<0>] arch_do_signal_or_restart+0x2a/0x260
[<0>] exit_to_user_mode_prepare+0x1c7/0x240
[<0>] syscall_exit_to_user_mode+0x2a/0x60
[<0>] do_syscall_64+0x3e/0x90






[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux