> From: Lorenzo Pieralisi <lpieralisi@xxxxxxxxxx> > Sent: Friday, November 11, 2022 1:56 AM > > ... > > + * For the single-MSI and MSI-X cases, it's OK for > > hv_compose_msi_req_get_cpu() > > + * to always return the same dummy vCPU, because a second call to > > + * hv_compose_msi_msg() contains the "real" vCPU, causing Hyper-V to > > choose a > > + * new pCPU for the interrupt. But for the multi-MSI case, the second call to > > + * hv_compose_msi_msg() exits without sending a message to the vPCI VSP, > > so the > > Why ? Can't you fix _that_ ? Why can't the initial call to > hv_compose_msi_msg() determine the _real_ target vCPU ? Unluckily I can't fix this because of the way it works in x86's irq management code. This is out of the control of the pci-hyperv driver. hv_compose_msi_msg() uses irq_data_get_effective_affinity_mask() to get the "effective"mask. On x86, when the irq is initialized, irq_data_update_effective_affinity() is called from apic_update_irq_cfg() -- please refer to the below debug code. When the initial call to hv_compose_msi_msg() is invoked, it's from pci_alloc_irq_vectors(), and the x86 irq code always passes CPU0 to pci-hyperv. Please see the below "cpumask_first(cpu_online_mask)" in vector_assign_managed_shutdown(). When hv_compose_msi_msg() is invoked the second time, it's from request_irq(), and the x86 irq code passes the real effectivey CPU to pci-hyperv. I added the below debug code and pasted the trimmed output below. --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -179,6 +179,7 @@ static void vector_assign_managed_shutdown(struct irq_data *irqd) unsigned int cpu = cpumask_first(cpu_online_mask); apic_update_irq_cfg(irqd, MANAGED_IRQ_SHUTDOWN_VECTOR, cpu); + WARN(irqd->irq >= 24, "cdx: vector_assign_managed_shutdown: irq=%d, cpu=%d\n", irqd->irq, cpu); } static int reserve_managed_vector(struct irq_data *irqd) @@ -251,6 +252,7 @@ assign_vector_locked(struct irq_data *irqd, const struct cpumask *dest) return vector; apic_update_vector(irqd, vector, cpu); apic_update_irq_cfg(irqd, vector, cpu); + WARN(irqd->irq >= 24, "cdx: assign_vector_locked: irq=%d, cpu=%d\n", irqd->irq, cpu); return 0; } @@ -328,6 +330,7 @@ assign_managed_vector(struct irq_data *irqd, const struct cpumask *dest) return vector; apic_update_vector(irqd, vector, cpu); apic_update_irq_cfg(irqd, vector, cpu); + WARN(irqd->irq >= 24, "cdx: assign_managed_vector: irq=%d, cpu=%d\n", irqd->irq, cpu); return 0; } --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1803,6 +1803,7 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) u32 size; int ret; + WARN(1, "cdx: hv_compose_msi_msg: irq=%d\n", data->irq); /* Reuse the previous allocation */ if (data->chip_data) { int_desc = data->chip_data; 1 cdx: vector_assign_managed_shutdown: irq=24, cpu=0 2 WARNING: CPU: 3 PID: 87 at arch/x86/kernel/apic/vector.c:182 vector_assign_managed_shutdown+0x53/0x60 3 RIP: 0010:vector_assign_managed_shutdown+0x53/0x60 4 reserve_irq_vector_locked+0x41/0xa0 5 x86_vector_alloc_irqs+0x298/0x460 6 irq_domain_alloc_irqs_hierarchy+0x1b/0x50 7 irq_domain_alloc_irqs_parent+0x17/0x30 8 msi_domain_alloc+0x83/0x150 9 irq_domain_alloc_irqs_hierarchy+0x1b/0x50 10 __irq_domain_alloc_irqs+0xdf/0x320 11 __msi_domain_alloc_irqs+0x103/0x3e0 12 msi_domain_alloc_irqs_descs_locked+0x3e/0x90 13 pci_msi_setup_msi_irqs+0x2d/0x40 14 __pci_enable_msix_range+0x2fd/0x420 15 pci_alloc_irq_vectors_affinity+0xb0/0x110 16 nvme_reset_work+0x1cf/0x1170 17 process_one_work+0x21f/0x3f0 18 worker_thread+0x4a/0x3c0 19 kthread+0xff/0x130 20 ret_from_fork+0x22/0x30 21 22 cdx: vector_assign_managed_shutdown: irq=24, cpu=0 23 WARNING: CPU: 3 PID: 87 at arch/x86/kernel/apic/vector.c:182 vector_assign_managed_shutdown+0x53/0x60 24 RIP: 0010:vector_assign_managed_shutdown+0x53/0x60 25 x86_vector_activate+0x149/0x1e0 26 __irq_domain_activate_irq+0x58/0x90 27 __irq_domain_activate_irq+0x38/0x90 28 irq_domain_activate_irq+0x2d/0x50 29 __msi_domain_alloc_irqs+0x1bb/0x3e0 30 msi_domain_alloc_irqs_descs_locked+0x3e/0x90 31 pci_msi_setup_msi_irqs+0x2d/0x40 32 __pci_enable_msix_range+0x2fd/0x420 33 pci_alloc_irq_vectors_affinity+0xb0/0x110 34 nvme_reset_work+0x1cf/0x1170 35 process_one_work+0x21f/0x3f0 36 worker_thread+0x4a/0x3c0 37 kthread+0xff/0x130 38 ret_from_fork+0x22/0x30 39 40 41 cdx: hv_compose_msi_msg: irq=24 42 WARNING: CPU: 3 PID: 87 at drivers/pci/controller/pci-hyperv.c:1806 hv_compose_msi_msg+0x3f/0x5d0 [pci_hyperv] 43 RIP: 0010:hv_compose_msi_msg+0x3f/0x5d0 [pci_hyperv] 44 irq_chip_compose_msi_msg+0x32/0x50 45 msi_domain_activate+0x45/0xa0 46 __irq_domain_activate_irq+0x58/0x90 47 irq_domain_activate_irq+0x2d/0x50 48 __msi_domain_alloc_irqs+0x1bb/0x3e0 49 msi_domain_alloc_irqs_descs_locked+0x3e/0x90 50 pci_msi_setup_msi_irqs+0x2d/0x40 51 __pci_enable_msix_range+0x2fd/0x420 52 pci_alloc_irq_vectors_affinity+0xb0/0x110 53 nvme_reset_work+0x1cf/0x1170 54 process_one_work+0x21f/0x3f0 55 worker_thread+0x4a/0x3c0 56 kthread+0xff/0x130 57 ret_from_fork+0x22/0x30 58 59 60 61 cdx: assign_vector_locked: irq=24, cpu=3 62 WARNING: CPU: 0 PID: 87 at arch/x86/kernel/apic/vector.c:255 assign_vector_locked+0x160/0x170 63 RIP: 0010:assign_vector_locked+0x160/0x170 64 assign_irq_vector_any_locked+0x6a/0x150 65 x86_vector_activate+0x93/0x1e0 66 __irq_domain_activate_irq+0x58/0x90 67 __irq_domain_activate_irq+0x38/0x90 68 irq_domain_activate_irq+0x2d/0x50 69 irq_activate+0x29/0x30 70 __setup_irq+0x2e5/0x780 71 request_threaded_irq+0x112/0x180 72 pci_request_irq+0xa3/0xf0 73 queue_request_irq+0x74/0x80 74 nvme_reset_work+0x408/0x1170 75 process_one_work+0x21f/0x3f0 76 worker_thread+0x4a/0x3c0 77 kthread+0xff/0x130 78 ret_from_fork+0x22/0x30 79 80 cdx: hv_compose_msi_msg: irq=24 81 WARNING: CPU: 0 PID: 87 at drivers/pci/controller/pci-hyperv.c:1806 hv_compose_msi_msg+0x3f/0x5d0 [pci_hyperv] 82 RIP: 0010:hv_compose_msi_msg+0x3f/0x5d0 [pci_hyperv] 83 irq_chip_compose_msi_msg+0x32/0x50 84 msi_domain_activate+0x45/0xa0 85 __irq_domain_activate_irq+0x58/0x90 86 irq_domain_activate_irq+0x2d/0x50 87 irq_activate+0x29/0x30 88 __setup_irq+0x2e5/0x780 89 request_threaded_irq+0x112/0x180 90 pci_request_irq+0xa3/0xf0 91 queue_request_irq+0x74/0x80 92 nvme_reset_work+0x408/0x1170 93 process_one_work+0x21f/0x3f0 94 worker_thread+0x4a/0x3c0 95 kthread+0xff/0x130 96 ret_from_fork+0x22/0x30 > > + * original dummy vCPU is used. This dummy vCPU must be round-robin'ed > > so that > > + * the pCPUs are spread out. All interrupts for a multi-MSI device end up > > using > > + * the same pCPU, even though the vCPUs will be spread out by later calls > > + * to hv_irq_unmask(), but that is the best we can do now. > > + * > > + * With current Hyper-V, the HVCALL_RETARGET_INTERRUPT hypercall does > *not* > > "current" Hyper-V means nothing, remove it or provide versioning > information. Imagine yourself reading this comment some time > in the future. Good point. @Wei, can you please help fix this? I think we can change "With current Hyper-V" To "With Hyper-V in Nov 2022". BTW, it's hard to provide the exact versioning info, because technically there are many versions of Hyper-V... > I can't claim to understand how this MSI vCPU to pCPU mapping is made to > work in current code but I can't ack this patch sorry, if you feel like > it is good to merge it it is your and Hyper-V maintainers call, feel > free to go ahead - I can review PCI hyper-V changes that affect PCI > and IRQs core APIs, I don't know Hyper-V internals. > > Lorenzo I understand. Thanks! I discussed the issue with Hyper-V team. I believe I understood it and this patch is the right thing to have. @Wei, Bjorn spotted two typos in the commit message, and Lorenzo suggested a change to the above "current". Can you please help fix these and merge the patch? Or, let me know if it'd be easier if I should send v4. Thanks, Dexuan