Re: Question about Hotplug and PME deadlock issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 27, 2019 at 07:52:16PM +0800, Dongdong Liu wrote:
> We do some test to trigger hotplug while do sysfs "remove" operation,
> then met a deadlock issue.
> 
> pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
> echo 1 > 0000\:00\:0c.0/remove
> 
> PME and hotplug share an MSI/MSI-X vector.
> The sysfs "remove" operation will call the below function.
> remove_store()
>   pci_stop_and_remove_bus_device_locked()
>     pci_lock_rescan_remove()
>     pci_stop_and_remove_bus_device()
>       ...
>       pcie_pme_remove()
>         synchronize_irq()   // Here will wait for hotplug IRQ handler
>     pci_unlock_rescan_remove();

This is (at least) a bug in the PME driver.  I'm not familiar with
that driver, so adding Keith Busch to cc.

The bug is that pcie_pme_remove() invokes pcie_pme_suspend(), which
calls synchronize_irq(), which waits for the hardirq handlers and
IRQ threads of all drivers sharing the IRQ to finish.

pcie_pme_remove() then calls free_irq() which *again* waits for a
running hardirq handler and thread to finish, however it does not
wait for those of drivers sharing the IRQ to finish since 4.19,
see commit 519cc8652b3a ("genirq: Synchronize only with single thread
on free_irq()").

I think the invocation of synchronize_irq() in pcie_pme_suspend() is
unnecessary when called from pcie_pme_remove(), it's been there since
the driver was added by Rafael with commit c7f486567c1d ("PCI PM: PCIe
PME root port service driver"), adding him to cc as well.

Thanks,
Lukas

> 
> Hotplug
> pciehp_ist()
>   pciehp_handle_presence_or_link_change()
>     pciehp_configure_device()
>       pci_lock_rescan_remove() // Here will wait the pci_unlock_rescan_remove()
> 
> So met the deadlock issue. Any idea to solve such issue ?
> 
> 2180.490703] INFO: task bash:10913 blocked for more than 120 seconds.
> [ 2180.503428]       Tainted: P           OE     4.19.5-1.1.29.aarch64 #1
> [ 2180.516496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2180.532176] bash            D    0 10913   6565 0x00000200
> 
>  # ps -ax |grep D
>   PID TTY      STAT   TIME COMMAND
> 10913 ttyAMA0  Ds+    0:00 -bash
> 12945 ?        Ss     0:00 /usr/sbin/sshd -D
> 14022 ?        D      0:00 [irq/745-pciehp]
> 15972 pts/1    S+     0:00 grep --color=auto D
> # cat /proc/14022/stack
> [<0>] __switch_to+0x94/0xd8
> [<0>] pci_lock_rescan_remove+0x20/0x28
> [<0>] pciehp_configure_device+0x30/0x140
> [<0>] pciehp_handle_presence_or_link_change+0x324/0x458
> [<0>] pciehp_ist+0x1dc/0x1e0
> [<0>] irq_thread_fn+0x30/0x90
> [<0>] irq_thread+0x140/0x210
> [<0>] kthread+0x134/0x138
> [<0>] ret_from_fork+0x10/0x1c
> [<0>] 0xffffffffffffffff
>  # cat /proc/10913/stack
> [<0>] __switch_to+0x94/0xd8
> [<0>] synchronize_irq+0x8c/0xc0
> [<0>] pcie_pme_suspend+0xa4/0x118
> [<0>] pcie_pme_remove+0x20/0x40
> [<0>] pcie_port_remove_service+0x3c/0x58
> [<0>] device_release_driver_internal+0x1b4/0x250
> [<0>] device_release_driver+0x28/0x38
> [<0>] bus_remove_device+0xd4/0x160
> [<0>] device_del+0x128/0x348
> [<0>] device_unregister+0x24/0x78
> [<0>] remove_iter+0x48/0x58
> [<0>] device_for_each_child+0x6c/0xb8
> [<0>] pcie_port_device_remove+0x2c/0x48
> [<0>] pcie_portdrv_remove+0x68/0x78
> [<0>] pci_device_remove+0x48/0x120
> [<0>] device_release_driver_internal+0x1b4/0x250
> [<0>] device_release_driver+0x28/0x38
> [<0>] pci_stop_bus_device+0x84/0xc0
> [<0>] pci_stop_and_remove_bus_device_locked+0x24/0x40
> [<0>] remove_store+0xa4/0xb8
> [<0>] dev_attr_store+0x44/0x60
> [<0>] sysfs_kf_write+0x58/0x80
> [<0>] kernfs_fop_write+0xd8/0x1e0
> [<0>] __vfs_write+0x60/0x190
> [<0>] vfs_write+0xac/0x1c0
> [<0>] ksys_write+0x6c/0xd8
> [<0>] __arm64_sys_write+0x24/0x30
> [<0>] el0_svc_common+0x78/0x100
> [<0>] el0_svc_handler+0x38/0x88
> [<0>] el0_svc+0x8/0xc
> [<0>] 0xffffffffffffffff
> 
> Thanks,
> Dongdong



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux