RE: Re: "oneshot" interrupt causes another interrupt to be fired erroneously in Haswell system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> [+cc Thomas, IRQ maintainer]
> 
> On Thu, Oct 31, 2019 at 03:53:50AM +0000, Kar Hin Ong wrote:
> > Hi,
> >
> > I've an Intel Haswell system running Linux kernel v4.14 with
> > preempt_rt patch. The system contain 2 IOAPICs: IOAPIC 1 is on the PCH
> > where IOAPIC 2 is on the CPU.
> >
> > I observed that whenever a PCI device is firing interrupt (INTx) to
> > Pin 20 of IOAPIC 2 (GSI 44); the kernel will receives 2 interrupts:
> >    1. Interrupt from Pin 20 of IOAPIC 2  -> Expected
> >    2. Interrupt from Pin 19 of IOAPIC 1  -> UNEXPECTED, erroneously
> >       triggered
> >
> > The unexpected interrupt is unhandled eventually. When this scenario
> > happen more than 99,000 times, kernel disables the interrupt line (Pin
> > 19 of IOAPIC 1) and causing device that has requested it become
> > malfunction.
> >
> > I managed to also reproduced this issue on RHEL 8 and Ubuntu 19-04
> > (without preempt_rt patch) after added "threadirqs" to the kernel
> > command line.
> >
> > After digging further, I noticed that the said issue is happened
> > whenever an interrupt pin on IOAPIC 2 is masked:
> >  - Masking Pin 20 of IOAPIC 2 triggers Pin 19 of IOAPIC 1
> >  - Masking Pin 22 of IOAPIC 2 triggers Pin 18 of IOAPIC 1
> >
> > I also noticed that kernel will explicitly mask a specific interrupt
> > pin before execute its handler, if the interrupt is configured as
> > "oneshot" (i.e. threaded). See
> > https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v4.14/sou
> > rce/kernel/irq/chip.c*L695__;Iw!fqWJcnlTkjM!89koUU9_SERIAj1lseZyKsfYfm
> > guRciK8coEnuqi0cnJ4tIO3OV5vG3Lbhn6-g$
> > This explained why it only happened on RTOS and Desktop Linux with
> > "threadirqs" flag, because these configurations force the interrupt
> > handler to be threaded.
> >
> > From Intel Xeon Processor E5/E7 v3 Product Family External Design
> > Specification (EDS), Volume One: Architecture, section 13.1 (Legacy
> > PCI Interrupt Handling), it mention: "If the I/OxAPIC entry is masked
> > (via the 'mask' bit in the corresponding Redirection Table Entry),
> > then the corresponding PCI Express interrupt(s) is forwarded to the
> > legacy PCH"
> >
> > My interpretation is: when kernel receive a "oneshot" interrupt, it
> > mask the line before start handling it (or sending the eoi signal).
> > At this moment, if the interrupt line is still asserting, then the
> > interrupt signal will be routed to the IOAPIC in PCH, and hence
> > causing another interrupt to be fired erroneously.
> >
> > I would like to understand if my interpretation is make sense. If yes,
> > should the "oneshot" algorithm need to be updated to support Haswell
> > system?
> 
> Just to make sure this hasn't already been fixed, can you reproduce the problem
> on a current kernel, e.g., v5.3 or v5.4-rc5?
The problem is reproducible on Ubuntu 19.10 (with kernel version 5.3.0-19-generic) as well.

> 
> Bjorn




[Index of Archives]     [Linux ia64]     [Linux Kernel]     [DCCP]     [Linux ARM]     [Yosemite News]     [Linux SCSI]     [Linux Hams]
  Powered by Linux