On Mon, Dec 23, 2024 at 07:37:46AM +0000, Athul Krishna wrote: > Can confirm. Reverting f9e54c3a2f5b from v6.13-rc1 fixed the problem. I suppose Alex should have some more thoughts, probably after the holidays. Before that, one quick question to ask.. > > -------- Original Message -------- > On 23/12/24 04:06, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > Forwarding since not everybody follows bugzilla. Apparently bisected > > to f9e54c3a2f5b ("vfio/pci: implement huge_fault support"). > > > > Athul, f9e54c3a2f5b appears to revert cleanly from v6.13-rc1. Can you > > verify that reverting it is enough to avoid these artifacts? > > > > #regzbot introduced: f9e54c3a2f5b ("vfio/pci: implement huge_fault support") > > > > ----- Forwarded message from bugzilla-daemon@xxxxxxxxxx ----- > > > > Date: Sat, 21 Dec 2024 10:10:02 +0000 > > From: bugzilla-daemon@xxxxxxxxxx > > To: bjorn@xxxxxxxxxxxxxxxxxxxxxxx > > Subject: [Bug 219619] New: vfio-pci: screen graphics artifacts after 6.12 kernel upgrade > > Message-ID: <bug-219619-41252@xxxxxxxxxxxxxxxxxxxxxxxxx/> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219619 > > > > Bug ID: 219619 > > Summary: vfio-pci: screen graphics artifacts after 6.12 kernel > > upgrade > > Product: Drivers > > Version: 2.5 > > Hardware: AMD > > OS: Linux > > Status: NEW > > Severity: normal > > Priority: P3 > > Component: PCI > > Assignee: drivers_pci@xxxxxxxxxxxxxxxxxxxx > > Reporter: athul.krishna.kr@xxxxxxxxxxxxxx > > Regression: No > > > > Created attachment 307382 > > --> https://bugzilla.kernel.org/attachment.cgi?id=307382&action=edit > > dmesg vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs pcieport 0000:00:01.1: AER: Multiple Uncorrectable (Non-Fatal) error message received from 0000:03:00.1 vfio-pci 0000:03:00.0: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID) vfio-pci 0000:03:00.0: device [1002:73ef] error status/mask=00100000/00000000 vfio-pci 0000:03:00.0: [20] UnsupReq (First) vfio-pci 0000:03:00.0: AER: TLP Header: 60001004 000000ff 0000007d fe7eb000 vfio-pci 0000:03:00.1: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID) vfio-pci 0000:03:00.1: device [1002:ab28] error status/mask=00100000/00000000 vfio-pci 0000:03:00.1: [20] UnsupReq (First) vfio-pci 0000:03:00.1: AER: TLP Header: 60001004 000000ff 0000007d fe7eb000 vfio-pci 0000:03:00.1: AER: Error of this Agent is reported first pcieport 0000:02:00.0: AER: broadcast error_detected message pcieport 0000:02:00.0: AER: broadcast mmio_enabled message pcieport 0000:02:00.0: AER: broadcast resume message pcieport 0000:02:00.0: AER: device recovery successful pcieport 0000:02:00.0: AER: broadcast error_detected message pcieport 0000:02:00.0: AER: broadcast mmio_enabled message pcieport 0000:02:00.0: AER: broadcast resume message pcieport 0000:02:00.0: AER: device recovery successful > > > > Device: Asus Zephyrus GA402RJ > > CPU: Ryzen 7 6800HS > > GPU: RX 6700S > > Kernel: 6.13.0-rc3-g8faabc041a00 > > > > Problem: > > Launching games or gpu bench-marking tools in qemu windows 11 vm will cause > > screen artifacts, ultimately qemu will pause with unrecoverable error. Is there more information on what setup can reproduce it? For example, does it only happen with Windows guests? Does the GPU vendor/model matter? > > > > Commit: > > f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101 is the first bad commit > > commit f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101 > > Author: Alex Williamson <alex.williamson@xxxxxxxxxx> > > Date: Mon Aug 26 16:43:53 2024 -0400 > > > > vfio/pci: implement huge_fault support Personally I have no clue yet on how this could affect it. I was initially worrying on any implicit cache mode changes on the mappings, but I don't think any of such was involved in this specific change. This commit majorly does two things: (1) allow 2M/1G mappings for BARs instead of small 4Ks always, and (2) always lazy faults rather than "install everything in the 1st fault". Maybe one of the two could have some impact in some way. IIUC basic paths were covered and hopefully should work, so I wonder what's the specialty. Might be relevant to above questions on the reproduceable setups. Thanks, -- Peter Xu