On 3/27/19, Dongdong Liu <liudongdong3@xxxxxxxxxx> wrote: > Hi Lukas > Many thanks for your reply. > > 在 2019/3/26 20:44, Lukas Wunner 写道: >> On Tue, Mar 26, 2019 at 07:43:17PM +0800, Dongdong Liu wrote: >>> Current we met another deadlock issue in hotplug driver. The calltrace is >>> as below. >>> The deadlock triggered by a hotplug event during a sysfs "remove" >>> operation. >>> Any suggestion to fix such deadlock ? >> >> That's a known problem, deadlocks may occur if hotplug ports are >> cascaded. I came up with a kludge to work around it but withdrew >> the patch: >> https://patchwork.ozlabs.org/patch/930403/ > > It seems the reason of two deadlock issues are not the same. > This deadlock issue triggered by a hotplug event during a sysfs "remove" > operation. > pciehp 0000:00:0c.0:pcie004: Slot(0-1): Card present > pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up > echo 1 > 0000\:00\:0c.0/remove > > The sysfs "remove" side is: > remove_store > pci_stop_and_remove_bus_device_locked > pci_lock_rescan_remove > pci_stop_and_remove_bus_device > ... > pciehp_remove > free_irq > kthread_stop # wait for hotplug IRQ handler > pci_unlock_rescan_remove > Can we swap the above two lines so that code waits without holding the log? > The hotplug side is: > pciehp_ist > pciehp_handle_presence_or_link_change > pciehp_configure_device > pci_lock_rescan_remove # wait for > pci_unlock_rescan_remove() > >> >> The real solution is to make the sections protected by >> pci_lock_rescan_remove() smaller or eliminate them as far as >> possible. So, no good solution available right now, sorry. >> > Thanks, any suggestion is appreciated. > > Thanks, > Dongdong. > >> Thanks, >> >> Lukas >> >>> >>> [ 4112.297250] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [ 4112.305069] bash D 0 6502 2207 0x00000200 >>> [ 4112.310544] Call trace: >>> [ 4112.312981] __switch_to+0x94/0xe8 >>> [ 4112.316373] __schedule+0x270/0x8b0 >>> [ 4112.319852] schedule+0x2c/0x88 >>> [ 4112.322981] schedule_timeout+0x224/0x448 >>> [ 4112.326979] wait_for_common+0x198/0x2a0 >>> [ 4112.330892] wait_for_completion+0x28/0x38 >>> [ 4112.334979] kthread_stop+0x60/0x190 >>> [ 4112.338544] __free_irq+0x1c0/0x348 >>> [ 4112.342022] free_irq+0x40/0x88 >>> [ 4112.345153] pcie_shutdown_notification+0x54/0x80 >>> [ 4112.349847] pciehp_remove+0x30/0x50 >>> [ 4112.353413] pcie_port_remove_service+0x3c/0x58 >>> [ 4112.357932] device_release_driver_internal+0x1b4/0x250 >>> [ 4112.363146] device_release_driver+0x28/0x38 >>> [ 4112.367406] bus_remove_device+0xd4/0x160 >>> [ 4112.371405] device_del+0x128/0x348 >>> [ 4112.374880] device_unregister+0x24/0x78 >>> [ 4112.378792] remove_iter+0x48/0x58 >>> [ 4112.382183] device_for_each_child+0x6c/0xb8 >>> [ 4112.386443] pcie_port_device_remove+0x2c/0x48 >>> [ 4112.390876] pcie_portdrv_remove+0x5c/0x68 >>> [ 4112.394963] pci_device_remove+0x48/0xd8 >>> [ 4112.398874] device_release_driver_internal+0x1b4/0x250 >>> [ 4112.404088] device_release_driver+0x28/0x38 >>> [ 4112.408348] pci_stop_bus_device+0x84/0xb8 >>> [ 4112.412434] pci_stop_and_remove_bus_device_locked+0x24/0x40 >>> [ 4112.418083] remove_store+0xa4/0xb8 >>> [ 4112.421560] dev_attr_store+0x44/0x60 >>> [ 4112.425213] sysfs_kf_write+0x58/0x80 >>> [ 4112.428864] kernfs_fop_write+0xe8/0x1f0 >>> [ 4112.432776] __vfs_write+0x60/0x190 >>> [ 4112.436255] vfs_write+0xac/0x1c0 >>> [ 4112.439560] ksys_write+0x6c/0xd8 >>> [ 4112.442861] __arm64_sys_write+0x24/0x30 >>> [ 4112.446773] el0_svc_common+0xa0/0x180 >>> [ 4112.450511] el0_svc_handler+0x38/0x78 >>> [ 4112.454249] el0_svc+0x8/0xc >>> [ 4112.457122] INFO: task irq/97-pciehp:17365 blocked for more than 120 >>> seconds. >>> [ 4112.464248] Tainted: P W OE >>> 4.19.25-vhulk1901.1.0.h111.aarch64+ #2 >>> [ 4112.471980] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [ 4112.479798] irq/97-pciehp D 0 17365 2 0x00000228 >>> [ 4112.485273] Call trace: >>> [ 4112.487710] __switch_to+0x94/0xe8 >>> [ 4112.491098] __schedule+0x270/0x8b0 >>> [ 4112.494575] schedule+0x2c/0x88 >>> [ 4112.497706] schedule_preempt_disabled+0x14/0x20 >>> [ 4112.502313] __mutex_lock.isra.1+0x1fc/0x540 >>> [ 4112.506572] __mutex_lock_slowpath+0x24/0x30 >>> [ 4112.510833] mutex_lock+0x80/0xa8 >>> [ 4112.514138] pci_lock_rescan_remove+0x20/0x28 >>> [ 4112.518485] pciehp_configure_device+0x30/0x140 >>> [ 4112.523005] pciehp_handle_presence_or_link_change+0x35c/0x4b0 >>> [ 4112.528826] pciehp_ist+0x1cc/0x1d0 >>> [ 4112.532305] irq_thread_fn+0x30/0x80 >>> [ 4112.535870] irq_thread+0x128/0x200 >>> [ 4112.539349] kthread+0x134/0x138 >>> [ 4112.542563] ret_from_fork+0x10/0x18 >> >> . >> > >