On 1/4/19 3:45 PM, Trent Piepho wrote:
On Fri, 2019-01-04 at 14:45 -0700, Stephen Warren wrote:
From: Stephen Warren <swarren@xxxxxxxxxx>
The current code does this when handling MSI IRQs:
a) Process the irq.
b) Clear the latched IRQ status.
If a new IRQ occurs any time after (a) has read the IRQ status for the
last time and before (b), it will be lost. For example, this occurs in
There was a patch series to fix this bug, sent in October, stretching
to December.
See "PCI: dwc: Fix interrupt race in when handling MSI"
And "PCI: designware: Move interrupt acking into the proper callback"
The result was more code changes, which in the end produce the same
order of operations as is done here to fix the race. But it uses the
irq framework correctly.
I do not think this problem was introduced until 4.14, so the nvidia
4.9 must have additional patches if this is observed there. I'm sure
it was not present in 4.12, as we used that, and did not see the issue
until updating to 4.16.
Yes, I took the DWC driver from our 4.14 kernel and dropped it into the
4.9 kernel, so that's why the issue shows up in 4.9 for us.
The current fix (for 4.21?) depends on significant restructuring that
was done to this driver around 4.17, see "PCI: dwc: Move MSI IRQs
allocation to IRQ domains hierarchical API".
I think my original patch ("Fix interrupt race ..") or your patch here,
which is basically the same, should be ported to 4.14 stable. But
there are some other opinions on that. It's not clear to me if the
hierarchical domain stuff would be back ported to stable series. It
does not seem to me to be a good idea to do so.
Hmm. Well I guess I'll go for the patch I posted in our downstream
kernels, since back-porting a bunch of not-yet-available restructuring
to our ancient kernels doesn't sound pleasant:-) But I'll go and take a
quick look at the other patches you mentioned just in case. Thanks!