On Sat, Mar 14, 2020 at 02:19:44PM +0000, Hoyer, David wrote: > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -637,6 +637,8 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) > events = atomic_xchg(&ctrl->pending_events, 0); > if (!events) { > pci_config_pm_runtime_put(pdev); > + ctrl->ist_running = false; > + wake_up(&ctrl->requester); > return IRQ_NONE; > } Thanks David for the report and sorry for the breakage. The above LGTM, please submit it as a proper patch and feel free to add my Reviewed-by. Please add the same two lines before the "return ret" a little further up in the function. If it's too cumbersome for you to submit a proper patch I can do it for you. > We've instrumented the code and we do see that pciehp_ist() runs > twice, once exiting with IRQ_HANDLED and then again with IRQ_NONE. > We believe that is due to the timing differences. Adding debug in > here changes the timings enough that the hang goes away, so we are > having troubles proving this 100% at the moment. But just based on > code inspection, if pciehp_ist() exits with the IRQ_NONE case, then > nothing will ever set ist_running=false until a subsequent hotplug > event happens that causes the IRQ_HANDLED case to run. (We were > able to prove that will cause things to "unhang" and progress at > that point - if you're hung and you remove a drive, the slot status > change will then unstick things.) The question is, why is pciehp_ist() run once more. Most likely because another event is signaled from the slot. Try adding a printk() at the top of pciehp_ist() which emits ctrl->pending_events to understand what's going on. Thanks, Lukas