Re: [PATCH 1/4] PCI/hotplug: Disable HPIE over reset

Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> · Mon, 17 Mar 2025 20:08:12 +0200 (EET)

On Sat, 15 Mar 2025, Lukas Wunner wrote:

> On Thu, Mar 13, 2025 at 04:23:30PM +0200, Ilpo Järvinen wrote:
> > pciehp_reset_slot() disables PDCE (Presence Detect Changed Enable) and
> > DLLSCE (Data Link Layer State Changed Enable) for the duration of reset
> > and clears the related status bits PDC and DLLSC from the Slot Status
> > register after the reset to avoid hotplug incorrectly assuming the card
> > was removed.
> > 
> > However, hotplug shares interrupt with PME and BW notifications both of
> > which can make pciehp_isr() to run despite PDCE and DLLSCE bits being
> > off. pciehp_isr() then picks PDC or DLLSC bits from the Slot Status
> > register due to the events that occur during reset and caches them into
> > ->pending_events. Later, the IRQ thread in pciehp_ist() will process
> > the ->pending_events and will assume the Link went Down due to a card
> > change (in pciehp_handle_presence_or_link_change()).
> > 
> > Change pciehp_reset_slot() to also clear HPIE (Hot-Plug Interrupt
> > Enable) as pciehp_isr() will first check HPIE to see if the interrupt
> > is not for it. Then synchronize with the IRQ handling to ensure no
> > events are pending, before invoking the reset.
> 
> After dwelling on this for a while, I'm thinking that it may re-introduce
> the issue fixed by commit f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock
> between reset_lock and device_lock"):
> 
> Looking at the second and third stack trace in its commit message,
> down_write(reset_lock) in pciehp_reset_slot() is basically equivalent
> to synchronize_irq() and we're holding device_lock() at that point,
> hindering progress of pciehp_ist().

This description was somewhat confusing but what I can see, now that you 
mentioned this, is that if pciehp_reset_slot() calls synchronize_irq(), it 
can result in trying to acquire device_lock() again while trying to drain 
the pending events. ->reset_lock seems irrelevant to that problem.

Thus, pciehp_reset_slot() cannot ever rely on completing the processing of 
all pending events before it invokes the reset as long as any of its 
callers is holding device_lock().

It's a bit sad, because removing most of the reset_lock complexity would 
have been nice simplification in locking, effectively it would have 
reverted f5eff5591b8f too.

> So I think I have guided you in the wrong direction and I apologize
> for that.
> 
> However it seems to me that this should be solvable with the small
> patch below.  Am I missing something?
> 
> @Joel Mathew Thomas, could you give the below patch a spin and see
> if it helps?
> 
> Thanks!
> 
> -- >8 --
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index bb5a8d9f03ad..99a2ac13a3d1 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -688,6 +688,11 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>  		return IRQ_HANDLED;
>  	}
>  
> +	/* Ignore events masked by pciehp_reset_slot(). */
> +	events &= ctrl->slot_ctrl;
> +	if (!events)
> +		return IRQ_HANDLED;
> +
>  	/* Save pending events for consumption by IRQ thread. */
>  	atomic_or(events, &ctrl->pending_events);
>  	return IRQ_WAKE_THREAD;

Yes, this should work, I think.

I'm not entirely sure though how reading ->slot_ctrl here synchronizes 
wrt. pciehp_reset_slot() invoking reset. What guarantees pciehp_isr() sees 
the updated ->slot_ctrl when pciehp_reset_slot() has proceeded to invoke 
the reset? (I'm in general very hesitant about lockless and barrierless 
reader being race free, I might be just paranoid about it.)

-- 
 i.