Re: [PATCH] PCI: dwc: Fix interrupt race in when handling MSI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2018-11-06 at 14:53 +0000, Lorenzo Pieralisi wrote:
> 
> 
> Not sure why (5) is not used in your lists, I assume because you want
> to highlight the race condition with the jump from 4 to 6 (or maybe
> you do not like number 5 :), just curious).

It's because I removed a step and I'm used to rst docs where it
renumbers automatically.

> 
> > The observant will notice that point 4 present the opportunity for the
> > SoC's interrupt controller to lose the interrupt in the same manner as
> > the bug in this driver.  The driver for that interrupt controller will
> > be written to properly deal with this.  In some cases the hardware
> > supports an EOI concept, where the 2nd IRQ is masked and internally
> > queued in the hardware, to be re-raised at EOI in step 7.  In other
> > cases the IRQ will be unmasked and re-raised at step 4, but the kernel
> > will see the handler is INPROGRESS and not re-invoke it, and instead set
> > a PENDING flag, which causes the handler to re-run at step 7.
> > 
> > [1] 
> > 
> > Fixes: 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled, not before")
> 
> I have two questions:
> 
> - Commit 8c934095fa2f has been in the kernel for a year and no
>   regression was reported. It was assumed to fix a problem so before
>   reverting it I want to make sure we are not breaking anything else.

There have been reports.  Vignesh reported this some time ago.  Also
see https://lkml.org/lkml/2018/11/2/55, the author indicated to me that
it was the caused by this problem after I pointed him to my patch.  And
I've reported it now too.  

We only updated to a kernel with the regression a few months ago.  I
think imx6/7 systems don't update their kernels that quickly.  Usually
not running a mainstream distro with automatic updates.  And there are
probably a lot of people using the NXP vendor kernel, which has a ton
of unmerged patches, and so they don't see this regression yet.

> - Your reasoning seems correct but I would pick Marc's brain on this
>   because I want to understand if what this patch does is what IRQ core
>   expects it to do, especially in relation to the IRQ chaining you
>   are mentioning.

I've traced this through the ARM arch code and used our custom PCI
device to generate interrupts in all the race windows.  It does work as
I say, though I can't say this it how it's supposed to work.




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux