On Thu, Mar 13, 2025 at 04:23:30PM +0200, Ilpo Järvinen wrote: > pciehp_reset_slot() disables PDCE (Presence Detect Changed Enable) and > DLLSCE (Data Link Layer State Changed Enable) for the duration of reset > and clears the related status bits PDC and DLLSC from the Slot Status > register after the reset to avoid hotplug incorrectly assuming the card > was removed. > > However, hotplug shares interrupt with PME and BW notifications both of > which can make pciehp_isr() to run despite PDCE and DLLSCE bits being > off. pciehp_isr() then picks PDC or DLLSC bits from the Slot Status > register due to the events that occur during reset and caches them into > ->pending_events. Later, the IRQ thread in pciehp_ist() will process > the ->pending_events and will assume the Link went Down due to a card > change (in pciehp_handle_presence_or_link_change()). > > Change pciehp_reset_slot() to also clear HPIE (Hot-Plug Interrupt > Enable) as pciehp_isr() will first check HPIE to see if the interrupt > is not for it. Then synchronize with the IRQ handling to ensure no > events are pending, before invoking the reset. After dwelling on this for a while, I'm thinking that it may re-introduce the issue fixed by commit f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock"): Looking at the second and third stack trace in its commit message, down_write(reset_lock) in pciehp_reset_slot() is basically equivalent to synchronize_irq() and we're holding device_lock() at that point, hindering progress of pciehp_ist(). So I think I have guided you in the wrong direction and I apologize for that. However it seems to me that this should be solvable with the small patch below. Am I missing something? @Joel Mathew Thomas, could you give the below patch a spin and see if it helps? Thanks! -- >8 -- diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index bb5a8d9f03ad..99a2ac13a3d1 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -688,6 +688,11 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) return IRQ_HANDLED; } + /* Ignore events masked by pciehp_reset_slot(). */ + events &= ctrl->slot_ctrl; + if (!events) + return IRQ_HANDLED; + /* Save pending events for consumption by IRQ thread. */ atomic_or(events, &ctrl->pending_events); return IRQ_WAKE_THREAD;