> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Sent: Saturday, August 28, 2021 1:44 PM > >> I tried the kernel parameter "intremap=nosid,no_x2apic_optout,nopost" > but > >> it didn't help. Only "intremap=off" can work round the no interrupt issue. > >> > >> When the no interrupt issue happens, irq 209's effective_affinity_list is 5. > >> I modified modify_irte() to print the irte->low, irte->high, and I also printed > >> the irte_index for irq 209, and they were all normal to me, and they were > >> exactly the same in the bad case and the good case -- it looks like, with > >> "intremap=on maxcpus=8", MSI-X on CPU5 can't work for the NIC device > >> (MSI-X on CPU5 works for other devices like a NVMe controller) , and > somehow > >> "onlining and then offlining CPU 8~31" can "fix" the issue, which is really > weird. > > Just for the record: maxcpus=N is a dangerous boot option as it leaves > the non brought up CPUs in a state where they can be hit by MCE > broadcasting without being able to act on it. Which means you're > operating the system out of spec. I didn't know about this. Thanks for the reply! > According to your debug output the interrupt in question belongs to the > INTEL-IR-3 interrupt domain, which means it hangs of IOMMU3, aka DMAR > unit 3. > > To which DMAR/remap unit are the other unaffected devices connected to? > > tglx With maxcpus=8, on CPU 5, the NIC receives no interrupt, but a NVMe interrupt ("INTEL-IR-6") on the CPU works, and two "IOAT" interrupts ("INTEL-IR-7") also work. Except the NIC, the only IRQs connected to the faulty IOMMU3 are irq33 and irq34: root@lsg-gen7-a:~# cat /sys/kernel/debug/irq/irqs/33 handler: handle_fasteoi_irq device: (null) status: 0x00004100 istate: 0x00000000 ddepth: 1 wdepth: 0 dstate: 0x3503a000 IRQD_LEVEL IRQD_IRQ_DISABLED IRQD_IRQ_MASKED IRQD_SINGLE_TARGET IRQD_MOVE_PCNTXT IRQD_AFFINITY_ON_ACTIVATE IRQD_CAN_RESERVE IRQD_HANDLE_ENFORCE_IRQCTX node: 1 affinity: 0-103 effectiv: 0 pending: domain: IO-APIC-18 hwirq: 0x0 chip: IR-IO-APIC flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: INTEL-IR-3 hwirq: 0x0 chip: INTEL-IR flags: 0x0 parent: domain: VECTOR hwirq: 0x21 chip: APIC flags: 0x0 Vector: 0 Target: 0 move_in_progress: 0 is_managed: 0 can_reserve: 1 has_reserved: 1 cleanup_pending: 0 root@lsg-gen7-a:~# cat /sys/kernel/debug/irq/irqs/34 handler: handle_edge_irq device: 0000:d7:00.0 status: 0x00004000 istate: 0x00000000 ddepth: 0 wdepth: 0 dstate: 0x37408200 IRQD_ACTIVATED IRQD_IRQ_STARTED IRQD_SINGLE_TARGET IRQD_MOVE_PCNTXT IRQD_AFFINITY_ON_ACTIVATE IRQD_CAN_RESERVE IRQD_DEFAULT_TRIGGER_SET IRQD_HANDLE_ENFORCE_IRQCTX node: 1 affinity: 0-7 effectiv: 1 pending: domain: INTEL-IR-MSI-3-3 hwirq: 0x6b80000 chip: IR-PCI-MSI flags: 0x30 IRQCHIP_SKIP_SET_WAKE IRQCHIP_ONESHOT_SAFE parent: domain: INTEL-IR-3 hwirq: 0x10000 chip: INTEL-IR flags: 0x0 parent: domain: VECTOR hwirq: 0x22 chip: APIC flags: 0x0 Vector: 34 Target: 1 move_in_progress: 0 is_managed: 0 can_reserve: 1 has_reserved: 0 cleanup_pending: 0 root@lsg-gen7-a:~# lspci |grep d7:00.0 d7:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) irq 33 doesn't appear in /proc/interupts. irq 34 in /proc/interupts also receives no interrupts. So it looks like IOMMU3 is somehow not working at all with maxcpus=8. "onlining and offlining CPU 8~31" can somehow "fix" it. :-) I'm not sure if this is a kernel issue or firmware issue. Thanks, Dexuan