Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2022-12-13 at 11:04 -0800, Guenter Roeck wrote:
> Hi,
> 
> On Fri, Nov 25, 2022 at 12:25:59AM +0100, Thomas Gleixner wrote:
> > Per device domains provide the real domain size to the core code. This
> > allows range checking on insertion of MSI descriptors and also paves the
> > way for dynamic index allocations which are required e.g. for IMS. This
> > avoids external mechanisms like bitmaps on the device side and just
> > utilizes the core internal MSI descriptor storxe for it.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > ---
> 
> This patch results in various s390 qemu test failures.
> There is a warning backtrace
> 
>    12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
> 
> followed by
> 
> [   12.684333] virtio_net: probe of virtio0 failed with error -34
> 
> and Ethernet interfaces don't instantiate.
> 
> When trying to instantiate virtio-pci and booting from it, I see
> the same warning backtrace followed by
> 
> [    9.943123] virtio_blk: probe of virtio0 failed with error -34
> 
> and a crash.
> 
> A typical backtrace is
> 
> [   12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
> [   12.675108] Modules linked in:
> [   12.675346] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.1.0-03225-g764822972d64 #1
> [   12.675512] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [   12.675648] Krnl PSW : 0704c00180000000 00000000001ec4c6 (msi_ctrl_valid+0x2e/0xb0)
> [   12.675853]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [   12.675987] Krnl GPRS: 00000000435318a9 0000000000000000 00000000035510a0 0000000000000000
> [   12.676069]            0000000000000000 000000000000ffff 0000000000000000 0000037fffb1b6c0
> [   12.676151]            0000000000000000 0000037fffb1b658 0000000000000000 0000037fffb1b658
> [   12.676232]            0000000002ae4100 00000000035510a0 0000037fffb1b568 0000037fffb1b538
> [   12.677127] Krnl Code: 00000000001ec4b8: 58303000		l	%r3,0(%r3)
> [   12.677127]            00000000001ec4bc: ec3c000f017f	clij	%r3,1,12,00000000001ec4da
> [   12.677127]           #00000000001ec4c2: af000000		mc	0,0
> [   12.677127]           >00000000001ec4c6: a7280000		lhi	%r2,0
> [   12.677127]            00000000001ec4ca: b9840022		llgcr	%r2,%r2
> [   12.677127]            00000000001ec4ce: ebbff0a00004	lmg	%r11,%r15,160(%r15)
> [   12.677127]            00000000001ec4d4: c0f400714f1a	brcl	15,0000000001016308
> [   12.677127]            00000000001ec4da: b9160033		llgfr	%r3,%r3
> [   12.677743] Call Trace:
> [   12.677835]  [<00000000001ec4c6>] msi_ctrl_valid+0x2e/0xb0
> [   12.677943]  [<00000000001ec58a>] msi_domain_free_descs+0x42/0x120
> [   12.678024]  [<00000000001ecaf0>] msi_domain_free_msi_descs_range+0x38/0x48
> [   12.678103]  [<00000000009db7ae>] __pci_enable_msix_range+0x44e/0x710
> [   12.678186]  [<00000000009d9da4>] pci_alloc_irq_vectors_affinity+0xa4/0x120
> [   12.678268]  [<00000000009f5888>] vp_request_msix_vectors+0xb8/0x208
> [   12.678348]  [<00000000009f5f24>] vp_find_vqs_msix+0x254/0x2f0
> [   12.678428]  [<00000000009f6016>] vp_find_vqs+0x56/0x1f8
> [   12.678508]  [<00000000009f4e4e>] vp_modern_find_vqs+0x3e/0x90
> [   12.678587]  [<0000000000ad8c14>] virtnet_find_vqs+0x244/0x3e8
> [   12.678669]  [<0000000000ad9268>] virtnet_probe+0x4b0/0xca8
> [   12.678748]  [<00000000009ed6b4>] virtio_dev_probe+0x1ec/0x418
> [   12.678826]  [<0000000000a3c246>] really_probe+0xd6/0x480
> [   12.678906]  [<0000000000a3c7a0>] driver_probe_device+0x40/0xf0
> [   12.678985]  [<0000000000a3d0e4>] __driver_attach+0xbc/0x228
> [   12.679065]  [<0000000000a396c0>] bus_for_each_dev+0x80/0xb8
> [   12.679143]  [<0000000000a3b38e>] bus_add_driver+0x1d6/0x260
> [   12.679222]  [<0000000000a3dc10>] driver_register+0xa8/0x170
> [   12.679312]  [<00000000017b8848>] virtio_net_driver_init+0x88/0xc0
> 
> This worked fine in v6.1 and earlier kernels. Bisect log attached.
> 
> Guenter

Yes, we were about to report the same issue. Currently in linux-next
PCI support is broken for both ConnectX based NICs, NVMes etc. Matthew
Rosato bisected this to the above mentioned commit on Monday and was I
believe still investigating details.


As far as I'm aware so far he tracked this down to code calling
msi_domain_get_hwsize() which in turn calls msi_get_device_domain()
which then returns NULL leading to msi_domain_get_hwsize() returning 0.
I think this is related to the fact that we currently don't use IRQ
domains.

Thanks,
Niklas




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux