The bug is caused by Windows kernel as a KVM guest. Cannot reproduce with Ubuntu 24.10 install iso and nouveau driver. Windows 11 23H2 install iso reproduces reliably. Two [0] more [1] kernel logs below. Decode worked only on the first - spent too long trying to fix it. On Tuesday, December 17th, 2024 at 08:57, Juri Lelli <juri.lelli@xxxxxxxxxx> wrote: > > On 16/12/24 20:40, Ranguvar wrote: > > > On Monday, December 16th, 2024 at 16:50, Sean Christopherson seanjc@xxxxxxxxxx wrote: > > > > > On Mon, Dec 16, 2024, Juri Lelli wrote: > > > > > > > On 14/12/24 19:52, Peter Zijlstra wrote: > > > > > > > > > On Sat, Dec 14, 2024 at 06:32:57AM +0000, Ranguvar wrote: > > > > > > > > > > > I have in kernel cmdline `iommu=pt isolcpus=1-7,17-23 rcu_nocbs=1-7,17-23 nohz_full=1-7,17-23`. Removing iommu=pt does not produce a change, and > > > > > > dropping the core isolation freezes the host on VM startup. > > > > > > As in, dropping all of isolcpus, rcu_nocbs, and nohz_full? Or just dropping > > > isolcpus? > > > > Thanks for looking. > > I had dropped all three, but not altered the VM guest config, which is: > > > > <cputune> > > <vcpupin vcpu='0' cpuset='2'/> > > <vcpupin vcpu='1' cpuset='18'/> > > ... > > <vcpupin vcpu='11' cpuset='23'/> > > <emulatorpin cpuset='1,17'/> > > <iothreadpin iothread='1' cpuset='1,17'/> > > <vcpusched vcpus='0' scheduler='fifo' priority='95'/> > > ... > > <iothreadsched iothreads='1' scheduler='fifo' priority='50'/> > > > Are you disabling/enabling/configuring RT throttling (sched_rt_{runtime, > period}_us) in your configuration? > I don't touch these. [ranguvar@khufu ~]$ cat /proc/sys/kernel/sched_rt_period_us 1000000 [ranguvar@khufu ~]$ cat /proc/sys/kernel/sched_rt_runtime_us 950000 I removed myself from realtime group also (used by PipeWire) but still the same breakage. > > </cputune> > > > > CPU mode is host-passthrough, cache mode is passthrough. > > > > The 24GB VRAM did cause trouble when setting up resizeable BAR months ago as well. It necessitated a special qemu config: > > qemu:commandline > > <qemu:arg value='-fw_cfg'/> > > <qemu:arg value='opt/ovmf/PciMmio64Mb,string=65536'/> > > </qemu:commandline> I removed this config block as it appears unnecessary now. No impact on this issue. I tried also changed the size of the BAR from 32GB to 256MB manually before running the guest. lspci: Region 1: Memory at 7000000000 (64-bit, prefetchable) [size=32G] Region 3: Memory at 7800000000 (64-bit, prefetchable) [size=32M] after unbinding vfio_pci, writing '8' to to resource1_resize, and rebinding: Region 1: Memory at 1040000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at 1050000000 (64-bit, prefetchable) [size=32M] No impact. [0]: https://ranguvar.io/pub/paste/linux-6.12-vm-regression/dmesg-6.11.0-rc1-1-git-00057-gbd9bbc96e835-20241216-decoded.log [1]: https://ranguvar.io/pub/paste/linux-6.12-vm-regression/dmesg-6.11.0-rc1-1-git-00057-gbd9bbc96e835-20241217.log