On Thu, Mar 06, 2025, Paolo Bonzini wrote: > Il gio 6 mar 2025, 21:44 Sean Christopherson <seanjc@xxxxxxxxxx> ha scritto: > > > Allowing the use of kvm_load_host_xsave_state() is really ugly, especially > > > since the corresponding code is so simple: > > > > > > if (cpu_feature_enabled(X86_FEATURE_PKU) && vcpu->arch.pkru != 0) > > > wrpkru(vcpu->arch.host_pkru); > > > > It's clearly not "so simple", because this code is buggy. > > > > The justification for using kvm_load_host_xsave_state() is that either KVM gets > > the TDX state model correct and the existing flows Just Work, or we handle all > > that state as one-offs and at best replicate concepts and flows, and at worst > > have bugs that are unique to TDX, e.g. because we get the "simple" code wrong, > > we miss flows that subtly consume state, etc. > > A typo doesn't change the fact that kvm_load_host_xsave_state is > optimized with knowledge of the guest CR0 and CR4; faking the values > so that the same field means both "exit value" and "guest value", I can't argue against that, but I still absolutely detest carrying dedicated code for SEV and TDX state management. It's bad enough that figuring out WTF actually happens basically requires encyclopedic knowledge of massive specs. I tried to figure out a way to share code, but everything I can come up with that doesn't fake vCPU state makes the non-TDX code a mess. :-( > just so that the common code does the right thing for pkru/xcr0/xss, FWIW, it's not just to that KVM does the right thing for those values, it's a defense in depth mechanism so that *when*, not if, KVM screws up, the odds of the bug being fatal to KVM and/or the guest are reduced. > is > unmaintainable and conceptually just wrong. I don't necessarily disagree, but what we have today isn't maintainable either. Without actual sanity check and safeguards in the low level helpers, we absolutely are playing a game of whack-a-mole. E.g. see commit 9b42d1e8e4fe ("KVM: x86: Play nice with protected guests in complete_hypercall_exit()"). At a glance, kvm_hv_hypercall() is still broken, because is_protmode() will return false incorrectly. > And while the change for XSS (and possibly other MSRs) is actually correct, > it should be justified for both SEV-ES/SNP and TDX rather than sneaked into > the TDX patches. > > While there could be other flows that consume guest state, they're > just as bound to do the wrong thing if vcpu->arch is only guaranteed > to be somehow plausible (think anything that for whatever reason uses > cpu_role). But the MMU code is *already* broken. kvm_init_mmu() => vcpu_to_role_regs(). It "works" because the fubar role is never truly consumed. I'm sure there are more examples. > There's no way the existing flows for !guest_state_protected should run _at > all_ when the register state is not there. If they do, it's a bug and fixing > them is the right thing to do (it may feel like whack-a-mole but isn't) Eh, it's still whack-a-mole, there just happen to be a finite number of moles :-)