Hi all, I'm running a single-CPU Linux VM on Hyper-V. The Linux kernel is v5.9-rc7 and I have CONFIG_NR_CPUS=256. The Hyper-V Host (Version 17763-10.0-1-0.1457) provides a guest firmware, which always reports 128 Local APIC entries in the ACPI MADT table. Here only the first Local APIC entry's "Processor Enabled" is 1 since this Linux VM is configured to have only 1 CPU. This means: in the Linux kernel, the "cpu_present_mask" and " cpu_online_mask " have only 1 CPU (i.e. CPU0), while the "cpu_possible_mask" has 128 CPUs, and the "nr_cpu_ids" is 128. I pass through an MSI-X-capable PCI device to the Linux VM (which has only 1 virtual CPU), and the below code does *not* report any error (i.e. pci_alloc_irq_vectors_affinity() returns 2, and request_irq() returns 0), but the code does not work: the second MSI-X interrupt is not happening while the first interrupt does work fine. int nr_irqs = 2; int i, nvec, irq; nvec = pci_alloc_irq_vectors_affinity(pdev, nr_irqs, nr_irqs, PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, NULL); for (i = 0; i < nvec; i++) { irq = pci_irq_vector(pdev, i); err = request_irq(irq, test_intr, 0, "test_intr", &intr_cxt[i]); } It turns out that pci_alloc_irq_vectors_affinity() -> ... -> irq_create_affinity_masks() allocates an improper affinity for the second interrupt. The below printk() shows that the second interrupt's affinity is 1-64, but only CPU0 is present in the system! As a result, later, request_irq() -> ... -> irq_startup() -> __irq_startup_managed() returns IRQ_STARTUP_ABORT because cpumask_any_and(aff, cpu_online_mask) is empty (i.e. >= nr_cpu_ids), and irq_startup() *silently* fails (i.e. "return 0;"), since __irq_startup() is only called for IRQ_STARTUP_MANAGED and IRQ_STARTUP_NORMAL. --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -484,6 +484,9 @@ struct irq_affinity_desc * for (i = affd->pre_vectors; i < nvecs - affd->post_vectors; i++) masks[i].is_managed = 1; + for (i = 0; i < nvecs; i++) + printk("i=%d, affi = %*pbl\n", i, + cpumask_pr_args(&masks[i].mask)); return masks; } [ 43.770477] i=0, affi = 0,65-127 [ 43.770484] i=1, affi = 1-64 Though here the issue happens to a Linux VM on Hyper-V, I think the same issue can also happen to a physical machine, if the physical machine also uses a lot of static MADT entries, of which only the entries of the present CPUs are marked to be "Processor Enabled == 1". I think pci_alloc_irq_vectors_affinity() -> __pci_enable_msix_range() -> irq_calc_affinity_vectors() -> cpumask_weight(cpu_possible_mask) should use cpu_present_mask rather than cpu_possible_mask (), so here irq_calc_affinity_vectors() would return 1, and __pci_enable_msix_range() would immediately return -ENOSPC to avoid a *silent* failure. However, git-log shows that this 2018 commit intentionally changed the cpu_present_mask to cpu_possible_mask: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs") so I'm not sure whether (and how?) we should address the *silent* failure. BTW, here I use a single-CPU VM to simplify the discussion. Actually, if the VM has n CPUs, with the above usage of pci_alloc_irq_vectors_affinity() (which might seem incorrect, but my point is that it's really not good to have a silent failure, which makes it a lot more difficult to figure out what goes wrong), it looks only the first n MSI-X interrupts can work, and the (n+1)'th MSI-X interrupt can not work due to the allocated improper affinity. According to my tests, if we need n+1 MSI-X interrupts in such a VM that has n CPUs, it looks we have 2 options (the second should be better): 1. Do not use the PCI_IRQ_AFFINITY flag, i.e. pci_alloc_irq_vectors_affinity(pdev, n+1, n+1, PCI_IRQ_MSIX, NULL); 2. Use the PCI_IRQ_AFFINITY flag, and pass a struct irq_affinity affd, which tells the API that we don't care about the first interrupt's affinity: struct irq_affinity affd = { .pre_vectors = 1, ... }; pci_alloc_irq_vectors_affinity(pdev, n+1, n+1, PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, &affd); PS, irq_create_affinity_masks() is complicated. Let me know if you're interested to know how it allocates the invalid affinity "1-64" for the second MSI-X interrupt. PS2, the latest Hyper-V provides only one ACPI MADT entry to a 1-CPU VM, so the issue described above can not reproduce there. Thanks, -- Dexuan