On 7/26/22 16:57, Stoiko Ivanov wrote:
Hi, Proxmox[0] recently switched to the 5.15 kernel series (based on the one for Ubuntu 22.04), which includes this commit. While it's working well on most installations, we have a few users who reported that some of their guests shutdown with `KVM: entry failed, hardware error 0x80000021` being logged under certain conditions and environments[1]: * The issue is not deterministically reproducible, and only happens eventually with certain loads (e.g. we have only one system in our office which exhibits the issue - and this only by repeatedly installing Windows 2k22 ~ one out of 10 installs will cause the guest-crash) * While most reports are referring to (newer) Windows guests, some users run into the issue with Linux VMs as well * The affected systems are from a quite wide range - our affected machine is an old IvyBridge Xeon with outdated BIOS (an equivalent system with the latest available BIOS is not affected), but we have reports of all kind of Intel CPUs (up to an i5-12400). It seems AMD CPUs are not affected. Disabling tdp_mmu seems to mitigate the issue, but I still thought you might want to know that in some cases tdp_mmu causes problems, or that you even might have an idea of how to fix the issue without explicitly disabling tdp_mmu?
If you don't need secure boot, you can try disabling SMM. It should not be related to TDP MMU, but the logs (thanks!) point at an SMM entry (RIP = 0x8000, CS base=0x7ffc2000).
This is likely to be fixed by https://lore.kernel.org/kvm/20220621150902.46126-1-mlevitsk@xxxxxxxxxx/.
Paolo