Am 21.07.22 um 17:51 schrieb Maxim Levitsky: > On Thu, 2022-07-21 at 14:49 +0200, Fabian Ebner wrote: >> Hi, >> since about half a year ago, we're getting user reports about guest >> reboot issues with KVM/QEMU[0]. >> >> The most common scenario is a Windows Server VM (2012R2/2016/2019, >> UEFI/OVMF and SeaBIOS) getting stuck during the screen with the Windows >> logo and the spinning circles after a reboot was triggered from within >> the guest. Quitting the kvm process and booting with a fresh instance >> works. The issue seems to become more likely, the longer the kvm >> instance runs. >> >> We did not get such reports while we were providing Linux 5.4 and QEMU >> 5.2.0, but we do with Linux 5.11/5.13/5.15 and QEMU 6.x. >> >> I'm just wondering if anybody has seen this issue before or might have a >> hunch what it's about? Any tips on what to look out for when debugging >> are also greatly appreciated! >> >> We do have debug access to a user's test VM and the VM state was saved >> before a problematic reboot, but I can't modify the host system there. >> AFAICT QEMU just executes guest code as usual, but I'm really not sure >> what to look out for. >> >> That VM has CPU type host, and a colleague did have a similar enough CPU >> to load the VM state, but for him, the reboot went through normally. On >> the user's system, it triggers consistently after loading the VM state >> and rebooting. >> >> So unfortunately, we didn't manage to reproduce the issue locally yet. >> With two other images provided by users, we ran into a boot loop, where >> QEMU resets the CPUs and does a few KVM_RUNs before the exit reason is >> KVM_EXIT_SHUTDOWN (which to my understanding indicates a triple fa >> ult) >> and then it repeats. It's not clear if the issues are related. > > > Does the guest have HyperV enabled in it (that is nested virtualization?) > For all three machines described above Get-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V indicates that HyperV is disabled. > Intel or AMD? > We do have reports for both Intel and AMD. > Does the VM uses secure boot / SMM? > The customer VM which can reliably trigger the issue after loading the state and rebooting uses SeaBIOS. For the other two VMs, Confirm-SecureBootUEFI returns "False". SMM might be a lead! We did disable SMM in the past, because apparently there were problems with it (didn't dig out which, was before I worked here), and the timing of enabling it and the reports coming in would match. I guess (some) guest OSes don't expect it to be suddenly turned on? However, there is a report of a user with two clusters with QEMU 5.2, one with kernel 5.4 without the issue and one with kernel 5.11 with the issue (Windows VM with spinning circles). So that's confusing :/ We do use some additional options if the OS type is "Windows" in our high-level configuration, including hyperV enlightenments: > -cpu 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt' > -no-hpet > -rtc 'driftfix=slew,base=localtime' > -global 'kvm-pit.lost_tick_policy=discard' But one user reported running into the issue even with OS type "other", i.e. when the above options are not present and CPU flags should be just '+kvm_pv_eoi,+kvm_pv_unhalt'. There are also reports with CPU type different from 'host', also with 'kvm64' (where we automatically set the flags +lahf_lm,+sep). Thank you and Best Regards, Fiona P.S. Please don't mind the (from your perspective sudden) name change. I'm still the same person and don't intend to change it again :) > Best regards, > Maxim Levitsky > >> >> There are also a few reports about non-Windows VMs, mostly Ubuntu 20.04 >> with UEFI/OVMF, but again, it's not clear if the issues are related. >> >> [0]: https://forum.proxmox.com/threads/100744/ >> (the forum thread is a bit chaotic unfortunately). >> >> Best Regards, >> Fabi >> >> > > >