On Sat, Mar 8, 2025 at 12:04 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Thu, Mar 06, 2025, Paolo Bonzini wrote: > I still absolutely detest carrying dedicated code > for SEV and TDX state management. It's bad enough that figuring out WTF actually > happens basically requires encyclopedic knowledge of massive specs. > > I tried to figure out a way to share code, but everything I can come up with that > doesn't fake vCPU state makes the non-TDX code a mess. :-( The only thing worse is requiring encyclopedic knowledge of both the specs and KVM. :) And yeah, we do require some knowledge of parts of KVM that *shouldn't* matter for protected-state guests, but it shouldn't be worse than needed. There's different microcode/firmware for VMX/SVM/SEV-ES+/TDX, the chance of sharing code is lower and lower as more stuff is added there---as is the case for SEV-ES/SNP and TDX. Which is why state management code for TDX is anyway doing its own thing most of the time---there's no point in sharing a little bit which is not even the hardest. > > just so that the common code does the right thing for pkru/xcr0/xss, > > FWIW, it's not just to that KVM does the right thing for those values, it's a > defense in depth mechanism so that *when*, not if, KVM screws up, the odds of the > bug being fatal to KVM and/or the guest are reduced. I would say the other way round is true too. Not relying too much on fake values in vcpu->arch can be more robust. > Without actual sanity check and safeguards in the low level helpers, we absolutely > are playing a game of whack-a-mole. > > E.g. see commit 9b42d1e8e4fe ("KVM: x86: Play nice with protected guests in > complete_hypercall_exit()"). > > At a glance, kvm_hv_hypercall() is still broken, because is_protmode() will return > false incorrectly. So the fixes are needed anyway and we're playing the game anyway. :( > > And while the change for XSS (and possibly other MSRs) is actually correct, > > it should be justified for both SEV-ES/SNP and TDX rather than sneaked into > > the TDX patches. > > > > While there could be other flows that consume guest state, they're > > just as bound to do the wrong thing if vcpu->arch is only guaranteed > > to be somehow plausible (think anything that for whatever reason uses > > cpu_role). > > But the MMU code is *already* broken. kvm_init_mmu() => vcpu_to_role_regs(). It > "works" because the fubar role is never truly consumed. I'm sure there are more > examples. Yes, and there should be at least a WARN_ON_ONCE when it is accessed, even if we don't completely cull the initialization of cpu_role... Loading the XSAVE state isn't any different. I'm okay with placing some values in cr0/cr4 or even xcr0/xss, but do not wish to use them more than the absolute minimum necessary. And I would rather not set more than the bare minimum needed in CR4... why set CR4.PKE for example, if KVM anyway has no business using the guest PKRU. Paolo > > There's no way the existing flows for !guest_state_protected should run _at > > all_ when the register state is not there. If they do, it's a bug and fixing > > them is the right thing to do (it may feel like whack-a-mole but isn't) > > Eh, it's still whack-a-mole, there just happen to be a finite number of moles :-)