https://bugzilla.kernel.org/show_bug.cgi?id=219010 Bug ID: 219010 Summary: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" Product: Virtualization Version: unspecified Hardware: All OS: Linux Status: NEW Severity: normal Priority: P3 Component: kvm Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx Reporter: zaltys@xxxxxxxxx Regression: No One of my virtual machines using PCI device passthrough (vfio) stopped working on OpenSuse Tumbleweed since kernel 6.9.7. Qemu 9.0.1 complains: qemu-system-x86_64: vfio: hot reset info failed: No space left on device qemu-system-x86_64: GLib: ../glib/gmem.c:177: failed to allocate 18446744068411217972 bytes and then coredumps. Qemu backtrace shows vfio_pci_get_pci_hot_reset_info() being the last qemu function being called. Reverting kernel 6.9.7 commit 9313244c26f3792daa86f3a18cc3bd5ad60310e0 (upstream f6944d4a0b87c16bc34ae589169e1ded3d4db08e) - "vfio/pci: Collect hot-reset devices to local buffer" fixes the problem. As I understand, that was backported to 6.9.7 from 6.10 tree. Upon more throughout analysis I pinpointed that crash is happening because of one specific passed device: sound card of Asus B650 Creator motherboard. VM starts on 6.9.7 if I remove this sound card from it. I think the important bit is this card being VF of device which does not report support for FLR: 15:00.0 | iommu group 28 | Phoenix PCIe Dummy Function <-- not passed to VM, no driver, reset method: pm bus 15:00.2 | iommu group 29 | Encryption controller (PSP/CCP) <-- ccp driver 15:00.3 | iommu group 30 | USB controller <-- xhci_hcd driver 15:00.4 | iommu group 31 | USB controller <-- xhci_hcd driver 15:00.6 | iommu group 32 | HD Audio Controller <-- sound card passed to VM After reverting the above mentioned commit, qemu complains: vfio: Cannot reset device 0000:15:00.6, depends on group 28 which is not owned exactly the same as before 6.9.7 and VM starts with that sound card passed. This might be an unsupported configuration, but qemu crashing with 6.9.7 also feels like kernel might be breaking userspace by handling/mishandling this differently, especially with minor version change. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.