Re: [PATCH] PCI: pciehp: Make sure pciehp_isr clears interrupt events

Stuart Hayes <stuart.w.hayes@xxxxxxxxx> · Wed, 13 Nov 2019 15:58:51 -0600

On 11/12/19 10:59 PM, Lukas Wunner wrote:
> On Tue, Nov 12, 2019 at 04:59:38PM -0500, Stuart Hayes wrote:
>> The pciehp interrupt handler pciehp_isr() will read the slot status
>> register and then write back to it to clear just the bits that caused the
>> interrupt. If a different interrupt event bit gets set between the read and
>> the write, pciehp_isr() will exit without having cleared all of the
>> interrupt event bits, so we will never get another hotplug interrupt from
>> that device.
> 
> The IRQ is masked when it occurs and unmasked after it's been handled.
> See the invocation of mask_ack_irq() in handle_edge_irq() and
> handle_level_irq() in kernel/irq/chip.c.
> 
> If the IRQ has fired in-between, the IRQ chip is expected to invoke
> the IRQ handler again.  There is some support for IRQ chips not
> capable of replaying interrupts in kernel/irq/resend.c, but in general,
> if you do not get another hotplug interrupt, the hardware is broken.
> What kind of IRQ chip are you using and what kind of chip is the
> hotplug port built into?
> 
> I'm not opposed to quirks to support such broken hardware but the
> commit message shouldn't purport that it's normal behavior and the
> quirk should only be executed where necessary and be made explicit
> in the code to be a quirk.
> 
> Thanks,
> 
> Lukas

Thank you for the feedback!

The hotplug port I'm seeing the issue with is an AMD "Starship/Matisse GPP
Bridge" (1022/1483), which uses an MSI interrupt (PCI-MSI chip).

I don't think that the masking and unmasking will make any difference in this
case, because this pciehp port does not support MSI per-vector masking.

The PCI spec says:

"If the Port is enabled for edge-triggered interrupt signaling using MSI or MSI-X,
an interrupt message must be sent every time the logical AND of the following
conditions transitions from FALSE to TRUE:

• The associated vector is unmasked (not applicable if MSI does not support PVM).

• The Hot-Plug Interrupt Enable bit in the Slot Control register is set to 1b.

• At least one hot-plug event status bit in the Slot Status register and its
associated enable bit in the Slot Control register are both set to 1b."

Because the AMD port does not support PVM (per vector masking), the first
condition will always be true.

Because the hot-plug interrupt enable bit isn't cleared on each interrupt, the
second condition is true.

And because individual event enable bits in the slot control register aren't
cleared on each interrupt, I interpret this to mean that an interrupt message
will be sent every time that the event status bits in the slot status register
transition from all zeros to at least one event status bit being 1.  Once one
of those event status bits is 1, the logical AND of the three conditions above
will not transition from FALSE to TRUE again until all of the (enabled) event
status bits in the slot status register all go to zero, which is what my patch
is intended to ensure.

(I noticed too late that I have a compile warning with the ctrl_warn() call, so
I'll have to make a V2 of the patch for that, at least.)

Thanks,
Stuart