On Tue, 2022-12-13 at 11:04 -0800, Guenter Roeck wrote: > Hi, > > On Fri, Nov 25, 2022 at 12:25:59AM +0100, Thomas Gleixner wrote: > > Per device domains provide the real domain size to the core code. This > > allows range checking on insertion of MSI descriptors and also paves the > > way for dynamic index allocations which are required e.g. for IMS. This > > avoids external mechanisms like bitmaps on the device side and just > > utilizes the core internal MSI descriptor storxe for it. > > > > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > --- > > This patch results in various s390 qemu test failures. > There is a warning backtrace > > 12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0 > > followed by > > [ 12.684333] virtio_net: probe of virtio0 failed with error -34 > > and Ethernet interfaces don't instantiate. > > When trying to instantiate virtio-pci and booting from it, I see > the same warning backtrace followed by > > [ 9.943123] virtio_blk: probe of virtio0 failed with error -34 > > and a crash. > > A typical backtrace is > > [ 12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0 > [ 12.675108] Modules linked in: > [ 12.675346] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G N 6.1.0-03225-g764822972d64 #1 > [ 12.675512] Hardware name: QEMU 8561 QEMU (KVM/Linux) > [ 12.675648] Krnl PSW : 0704c00180000000 00000000001ec4c6 (msi_ctrl_valid+0x2e/0xb0) > [ 12.675853] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > [ 12.675987] Krnl GPRS: 00000000435318a9 0000000000000000 00000000035510a0 0000000000000000 > [ 12.676069] 0000000000000000 000000000000ffff 0000000000000000 0000037fffb1b6c0 > [ 12.676151] 0000000000000000 0000037fffb1b658 0000000000000000 0000037fffb1b658 > [ 12.676232] 0000000002ae4100 00000000035510a0 0000037fffb1b568 0000037fffb1b538 > [ 12.677127] Krnl Code: 00000000001ec4b8: 58303000 l %r3,0(%r3) > [ 12.677127] 00000000001ec4bc: ec3c000f017f clij %r3,1,12,00000000001ec4da > [ 12.677127] #00000000001ec4c2: af000000 mc 0,0 > [ 12.677127] >00000000001ec4c6: a7280000 lhi %r2,0 > [ 12.677127] 00000000001ec4ca: b9840022 llgcr %r2,%r2 > [ 12.677127] 00000000001ec4ce: ebbff0a00004 lmg %r11,%r15,160(%r15) > [ 12.677127] 00000000001ec4d4: c0f400714f1a brcl 15,0000000001016308 > [ 12.677127] 00000000001ec4da: b9160033 llgfr %r3,%r3 > [ 12.677743] Call Trace: > [ 12.677835] [<00000000001ec4c6>] msi_ctrl_valid+0x2e/0xb0 > [ 12.677943] [<00000000001ec58a>] msi_domain_free_descs+0x42/0x120 > [ 12.678024] [<00000000001ecaf0>] msi_domain_free_msi_descs_range+0x38/0x48 > [ 12.678103] [<00000000009db7ae>] __pci_enable_msix_range+0x44e/0x710 > [ 12.678186] [<00000000009d9da4>] pci_alloc_irq_vectors_affinity+0xa4/0x120 > [ 12.678268] [<00000000009f5888>] vp_request_msix_vectors+0xb8/0x208 > [ 12.678348] [<00000000009f5f24>] vp_find_vqs_msix+0x254/0x2f0 > [ 12.678428] [<00000000009f6016>] vp_find_vqs+0x56/0x1f8 > [ 12.678508] [<00000000009f4e4e>] vp_modern_find_vqs+0x3e/0x90 > [ 12.678587] [<0000000000ad8c14>] virtnet_find_vqs+0x244/0x3e8 > [ 12.678669] [<0000000000ad9268>] virtnet_probe+0x4b0/0xca8 > [ 12.678748] [<00000000009ed6b4>] virtio_dev_probe+0x1ec/0x418 > [ 12.678826] [<0000000000a3c246>] really_probe+0xd6/0x480 > [ 12.678906] [<0000000000a3c7a0>] driver_probe_device+0x40/0xf0 > [ 12.678985] [<0000000000a3d0e4>] __driver_attach+0xbc/0x228 > [ 12.679065] [<0000000000a396c0>] bus_for_each_dev+0x80/0xb8 > [ 12.679143] [<0000000000a3b38e>] bus_add_driver+0x1d6/0x260 > [ 12.679222] [<0000000000a3dc10>] driver_register+0xa8/0x170 > [ 12.679312] [<00000000017b8848>] virtio_net_driver_init+0x88/0xc0 > > This worked fine in v6.1 and earlier kernels. Bisect log attached. > > Guenter Yes, we were about to report the same issue. Currently in linux-next PCI support is broken for both ConnectX based NICs, NVMes etc. Matthew Rosato bisected this to the above mentioned commit on Monday and was I believe still investigating details. As far as I'm aware so far he tracked this down to code calling msi_domain_get_hwsize() which in turn calls msi_get_device_domain() which then returns NULL leading to msi_domain_get_hwsize() returning 0. I think this is related to the fact that we currently don't use IRQ domains. Thanks, Niklas