Hi I can reproduce the lost MSI interrupt issue on 5.6-rc6 which includes the "Plug non-maskable MSI affinity race" patch. I can see this on a couple platforms, I'm running a script that first generates a lot of usb traffic, and then in a busyloop sets irq affinity and turns off and on cpus: for i in 1 3 5 7; do echo "1" > /sys/devices/system/cpu/cpu$i/online done echo "A" > "/proc/irq/*/smp_affinity" echo "A" > "/proc/irq/*/smp_affinity" echo "F" > "/proc/irq/*/smp_affinity" for i in 1 3 5 7; do echo "0" > /sys/devices/system/cpu/cpu$i/online done I added some very simple debugging but I don't really know what to look for. xhci interrupts (122) just stop after a setting msi affinity, it survived many similar msi_set_affinity() calls before this. I'm not that familiar with the inner workings of this, but I'll be happy to help out with adding debugging and testing patches. Details: cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 26 0 0 0 0 0 0 0 IO-APIC 2-edge timer 1: 0 0 0 0 0 7 0 0 IO-APIC 1-edge i8042 4: 0 4 59941 0 0 0 0 0 IO-APIC 4-edge ttyS0 8: 0 0 0 0 0 0 1 0 IO-APIC 8-edge rtc0 9: 0 40 8 0 0 0 0 0 IO-APIC 9-fasteoi acpi 16: 0 0 0 0 0 0 0 0 IO-APIC 16-fasteoi i801_smbus 120: 0 0 293 0 0 0 0 0 PCI-MSI 32768-edge i915 121: 728 0 0 58 0 0 0 0 PCI-MSI 520192-edge enp0s31f6 122: 63575 2271 0 1957 7262 0 0 0 PCI-MSI 327680-edge xhci_hcd 123: 0 0 0 0 0 0 0 0 PCI-MSI 514048-edge snd_hda_intel:card0 NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts trace snippet: <idle>-0 [001] d.h. 129.676900: xhci_irq: xhci irq <idle>-0 [001] d.h. 129.677507: xhci_irq: xhci irq <idle>-0 [001] d.h. 129.677556: xhci_irq: xhci irq <idle>-0 [001] d.h. 129.677647: xhci_irq: xhci irq <...>-14 [001] d..1 129.679802: msi_set_affinity: direct update msi 122, vector 33 -> 33, apicid: 2 -> 6 <idle>-0 [003] d.h. 129.682639: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.682769: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.682908: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.683552: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.683677: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.683819: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.689017: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.689140: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.689307: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.689984: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.690107: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.690278: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.695541: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.695674: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.695839: xhci_irq: xhci irq <idle>-0 [003] d.H. 129.696667: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.696797: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.696973: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.702288: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.702380: xhci_irq: xhci irq <idle>-0 [003] d.h. 129.702493: xhci_irq: xhci irq migration/3-24 [003] d..1 129.703150: msi_set_affinity: direct update msi 122, vector 33 -> 33, apicid: 6 -> 0 kworker/0:0-5 [000] d.h. 131.328790: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 kworker/0:0-5 [000] d.h. 133.312704: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 kworker/0:0-5 [000] d.h. 135.360786: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 <idle>-0 [000] d.h. 137.344694: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 kworker/0:0-5 [000] d.h. 139.128679: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 kworker/0:0-5 [000] d.h. 141.312686: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 kworker/0:0-5 [000] d.h. 143.360703: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 kworker/0:0-5 [000] d.h. 145.344791: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -92,6 +92,10 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force) cfg->vector == old_cfg.vector || old_cfg.vector == MANAGED_IRQ_SHUTDOWN_VECTOR || cfg->dest_apicid == old_cfg.dest_apicid) { + trace_printk("direct update msi %u, vector %u -> %u, apicid: %u -> %u\n", + irqd->irq, + old_cfg.vector, cfg->vector, + old_cfg.dest_apicid, cfg->dest_apicid); irq_msi_update_msg(irqd, cfg); return ret; } @@ -134,7 +138,10 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force) */ if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector]))) this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED); - + trace_printk("twostep update msi, irq %u, vector %u -> %u, apicid: %u -> %u\n", + irqd->irq, + old_cfg.vector, cfg->vector, + old_cfg.dest_apicid, cfg->dest_apicid); /* Redirect it to the new vector on the local CPU temporarily */ old_cfg.vector = cfg->vector; irq_msi_update_msg(irqd, &old_cfg); Thanks -Mathias