Re: [PATCH 0/3] PCI: designware: Fixing MSI handling flow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2018-11-13 at 22:57 +0000, Marc Zyngier wrote:
> It recently came to light that the Designware PCIe driver is rather
> broken in the way it handles MSI[1]:
> 
> - It masks interrupt by disabling them, meaning that MSIs generated
>   during the masked window are simply lost. Oops.
> 
> - Acking of the currently pending MSI is done outside of the
> interrupt
>   flow, getting moved around randomly and ultimately breaking the
>   driver. Not great.
> 
> This series attempts to address this by switching to using the MASK
> register for masking interrupts (!), and move the ack into the
> appropriate callback, giving it a fixed place in the MSI handling
> flow.
> 
> Note that this is only compile-tested on my arm64 laptop, as I'm
> travelling and do not have the required HW to test it anyway. I'd
> welcome both review and testing by the interested parties (dwc
> maintainer and users affected by existing bugs).
> 

I've started to test this series after porting all the patches needed
to make IMX7d work from 4.16.8 to 4.20.0-rc2.

Took a little while to figure out that the pcieport driver has a new
config entry to enable, or one gets no interrupts.  I'm not sure if
this is entirely correct behavior.

The new domain stuff does not appear to integrate into the existing irq
framework perfectly.  My interrupt has changed from MSI #1 to MSI
#524288.  Not the most user friendly number.

292:          0          0   PCI-MSI   0 Edge      PCIe PME, aerdrv
293:          1          0   PCI-MSI 524288 Edge      impinj-rfid-modem

Previously the dwc controller would show up as the owner of GPCv2 IRQ
122.  It doesn't any more.  Seems like the kernel info for it is wrong.

/sys/kernel/irq/65/actions:(null)
/sys/kernel/irq/65/chip_name:GPCv2
/sys/kernel/irq/65/hwirq:122
/sys/kernel/irq/65/per_cpu_count:0,0
/sys/kernel/irq/65/type:edge

Should be level and the count should be 1,0.  The debugfs interface is
more accurate:

# cat /sys/kernel/debug/irq/irqs/65
handler:  dw_chained_msi_isr
device:   (null)
status:   0x00010c00
            _IRQ_NOPROBE
            _IRQ_NOREQUEST
            _IRQ_NOTHREAD
dstate:   0x03400204
            IRQ_TYPE_LEVEL_HIGH
            IRQD_ACTIVATED
            IRQD_IRQ_STARTED
            IRQD_SINGLE_TARGET

Still doesn't know what device it's for.

Now I can finally test it!

Confirmed interrupt race is still there in stock kernel.

Confirmed after my patch I didn't see the race.  Didn't check why the
broken enable/disable as mask didn't appear cause a new race, but
something must be wrong somewhere.

Tried your 1st patch.  As I mentioned before in a reply to Gustavo,
just changing the enable to mask results in the MSI never getting
enabled in the first place.  Nothing else writes to the enable
register...

As a workaround, I added an irq_enable method to dw_pcie_msi_irq_chip
that just chains to the parent, and then a hacky irq_enable in
dw_pci_msi_bottom_irq_chip that manipulates the enable register.

Now it works again.  Race still present.  I don't see the
dw_pci_msi_bottom_(un)mask methods ever get called.  I seem to recall
that they are called as a substitute if enable/disable are not present,
but haven't confirmed that, which would explain why they are not called
after I added enable.

Next tried your next two patches.  No longer see lost interrupts, as
the status is cleared before the handler is called.

>From what I see the clear of the status bit is effectively at the same
point in the irq path as the way I cleared it in my patch.  There's
just a longer call chain to get to it in the ack method.  Not that it's
not a better place for it (which isn't there in 4.16), but I don't
think it changes anything.  Is there some reason dw_pci_bottom_ack
would not be called?

Since I don't see the un(mask) methods ever get called, I'm not sure if
they are correct or not.  I also had some unanswered details of
behavior on unmask.  I can see possible flaws, depending on how this
works.




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux