On Tue, Nov 28, 2023, Maciej S. Szmigiero wrote: > On 28.11.2023 17:48, Sean Christopherson wrote: > > On Mon, Nov 27, 2023, Maciej S. Szmigiero wrote: > > > On 27.11.2023 18:24, Sean Christopherson wrote: > > > > On Thu, Nov 23, 2023, Maciej S. Szmigiero wrote: > > > > > From: "Maciej S. Szmigiero" <maciej.szmigiero@xxxxxxxxxx> > > > > > > > > > > Since commit b0563468eeac ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17") > > > > > kernel unconditionally clears the XSAVES CPU feature bit on Zen1/2 CPUs. > > > > > > > > > > Since KVM CPU caps are initialized from the kernel boot CPU features this > > > > > makes the XSAVES feature also unavailable for KVM guests in this case, even > > > > > though they might want to decide on their own whether they are affected by > > > > > this errata. > > > > > > > > > > Allow KVM guests to make such decision by setting the XSAVES KVM CPU > > > > > capability bit based on the actual CPU capability > > > > > > > > This is not generally safe, as the guest can make such a decision if and only if > > > > the Family/Model/Stepping information is reasonably accurate. > > > > > > If one lies to the guest about the CPU it is running on then obviously > > > things may work non-optimally. > > > > But this isn't about running optimally, it's about functional correctness. And > > "lying" to the guest about F/M/S is extremely common. > > > > > > > This fixes booting Hyper-V enabled Windows Server 2016 VMs with more than > > > > > one vCPU on Zen1/2 CPUs. > > > > > > > > How/why does lack of XSAVES break a multi-vCPU setup? Is Windows blindly doing > > > > XSAVES based on FMS? > > > > > > The hypercall from L2 Windows to L1 Hyper-V asking to boot the first AP > > > returns HV_STATUS_CPUID_XSAVE_FEATURE_VALIDATION_ERROR. > > > > If it's just about CPUID enumeration, then userspace can simply stuff the XSAVES > > feature flag. This is not something that belongs in KVM, because this is safe if > > and only if F/M/S is accurate and the guest is actually aware of the erratum (or > > will not actually use XSAVES for other reasons), neither of which KVM can guarantee. > > In other words, your suggestion is that QEMU (or other VMM) not KVM > should be the one setting the XSAVES CPUID bit back, correct? > > I don't think this would work with the current KVM code since it seems > to make various decisions depending on presence of XSAVES bit in KVM > caps rather than the guest CPUID and on boot_cpu_has(XSAVES) - one of > such code blocks was even modified by this patch. > > It even says in the comment above that code that it is not possible to > actually disable XSAVES without disabling all other variants on SVM so > this has to be enabled if CPU supports it to switch the XSS MSR at > guest entry/exit (in this case it looks harmless since Zen1/2 > supposedly don't support any supervisor extended states). > > So it looks like we would need changes to *both* KVM and QEMU to > restore the XSAVES support this way. I'm not suggesting we restore XSAVES support, I'm suggesting that _if_ someone wants to hack their setup to let the guest use broken hardware, then they should do that in userspace or in an a private kernel, not in upstream KVM.