Lukas Wunner <lukas@xxxxxxxxx> 於 2024年9月26日 週四 下午9:23寫道: > > On Thu, Sep 26, 2024 at 08:59:09PM +0800, Chia-Lin Kao (AceLan) wrote: > > Remove unnecessary pci_walk_bus() call in pciehp_resume_noirq(). This > > fixes a system hang that occurs when resuming after a Thunderbolt dock > > with attached thunderbolt storage is unplugged during system suspend. > > > > The PCI core already handles setting the disconnected state for devices > > under a port during suspend/resume. > > Please explain in the commit message where the PCI core does that. Hi Lukas, I found my patch doesn't work. Let me explain the new findings below. > > > The redundant bus walk was > > interfering with proper hardware state detection during resume, causing > > a system hang when hot-unplugging daisy-chained Thunderbolt devices. > > Please explain what "proper hardware state detection" means. > > Did you get a hung task stacktrace? If so, please include it in the > commit message. > > If you're getting a system hang, it means there's a deadlock > involving pci_bus_sem. I don't quite see how that could happen, > so a more verbose explanation would be appreciated. I have no good answer for you now. After enabling some debugging options and debugging lock options, I still didn't get any message. It just hangs there while resuming, and nothing could be logged. Here is my setup ubuntu@localhost:~$ lspci -tv -[0000:00]-+-00.0 Intel Corporation Device 6400 +-02.0 Intel Corporation Lunar Lake [Intel Graphics] +-04.0 Intel Corporation Device 641d +-05.0 Intel Corporation Device 645d +-07.0-[01-38]-- +-07.2-[39-70]----00.0-[3a-70]--+-00.0-[3b]-- | +-01.0-[3c-4d]-- | +-02.0-[4e-5f]----00.0-[4f-50]----01.0-[50]----00.0 Phison Electronics Corporation E12 NVMe Controlle r | +-03.0-[60-6f]-- | \-04.0-[70]-- This is Dell WD22TB dock 39:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] [8086:0b26] (rev 03) Subsystem: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] [8086:0000] Kernel driver in use: pcieport This is the TBT storage connects to the dock 50:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation E12 NVMe Controller [1987:5012] (rev 01) Subsystem: Phison Electronics Corporation E12 NVMe Controller [1987:5012] Kernel driver in use: nvme Kernel modules: nvme While resuming, the dock(8086:0b26) resumes first and I found if dock device run through below 2 functions in pciehp_resume_noirq() if (pciehp_device_replaced(ctrl)) { pci_walk_bus(ctrl->pcie->port->subordinate,pci_dev_set_disconnected, NULL); pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); } The system hangs when storage(1987:5012) also calls the 2 functions. Only one of the 2 devices can enter the if statement, dock or storage, or the system hangs. To test it more, I found that mask out pciehp_request() eases the issue. It may be because it calls irq_wake_thread() and is stuck somewhere. So, I try to get rid of the irq_wake_thread() by replacing pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); with atomic_or(PCI_EXP_SLTSTA_PDC, &ctrl->pending_events); In this case, only dock(8086:0b26) will be called in pciehp_resume_noirq(). And the tbt storage will be released after pci_power_up() fails. Do you think this is a feasible solution? diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index ff458e692fed..56bf23d55c41 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -332,7 +332,7 @@ static int pciehp_resume_noirq(struct pcie_device *dev) ctrl_dbg(ctrl, "device replaced during system sleep\n"); pci_walk_bus(ctrl->pcie->port->subordinate, pci_dev_set_disconnected, NULL); - pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); + atomic_or(PCI_EXP_SLTSTA_PDC, &ctrl->pending_events); } } > > Thanks, > > Lukas > > > > Fixes: 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep") > > Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@xxxxxxxxxxxxx> > > --- > > drivers/pci/hotplug/pciehp_core.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c > > index ff458e692fed..c1c3f7e2bc43 100644 > > --- a/drivers/pci/hotplug/pciehp_core.c > > +++ b/drivers/pci/hotplug/pciehp_core.c > > @@ -330,8 +330,6 @@ static int pciehp_resume_noirq(struct pcie_device *dev) > > */ > > if (pciehp_device_replaced(ctrl)) { > > ctrl_dbg(ctrl, "device replaced during system sleep\n"); > > - pci_walk_bus(ctrl->pcie->port->subordinate, > > - pci_dev_set_disconnected, NULL); > > pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); > > } > > } > > -- > > 2.43.0