On Fri, May 07, 2021, Tom Lendacky wrote: > On 5/7/21 11:59 AM, Sean Christopherson wrote: > > Allow userspace to set CR0, CR4, CR8, and EFER via KVM_SET_SREGS for > > protected guests, e.g. for SEV-ES guests with an encrypted VMSA. KVM > > tracks the aforementioned registers by trapping guest writes, and also > > exposes the values to userspace via KVM_GET_SREGS. Skipping the regs > > in KVM_SET_SREGS prevents userspace from updating KVM's CPU model to > > match the known hardware state. > > This is very similar to the original patch I had proposed that you were > against :) I hope/think my position was that it should be unnecessary for KVM to need to know the guest's CR0/4/0 and EFER values, i.e. even the trapping is unnecessary. I was going to say I had a change of heart, as EFER.LMA in particular could still be required to identify 64-bit mode, but that's wrong; EFER.LMA only gets us long mode, the full is_64_bit_mode() needs access to cs.L, which AFAICT isn't provided by #VMGEXIT or trapping. Unless I'm missing something, that means that VMGEXIT(VMMCALL) is broken since KVM will incorrectly crush (or preserve) bits 63:32 of GPRs. I'm guessing no one has reported a bug because either (a) no one has tested a hypercall that requires bits 63:32 in a GPR or (b) the guest just happens to be in 64-bit mode when KVM_SEV_LAUNCH_UPDATE_VMSA is invoked and so the segment registers are frozen to make it appear as if the guest is perpetually in 64-bit mode. I see that sev_es_validate_vmgexit() checks ghcb_cpl_is_valid(), but isn't that either pointless or indicative of a much, much bigger problem? If VMGEXIT is restricted to CPL0, then the check is pointless. If VMGEXIT isn't restricted to CPL0, then KVM has a big gaping hole that allows a malicious/broken guest userspace to crash the VM simply by executing VMGEXIT. Since valid_bitmap is cleared during VMGEXIT handling, I don't think guest userspace can attack/corrupt the guest kernel by doing a replay attack, but it does all but guarantee a VMGEXIT at CPL>0 will be fatal since the required valid bits won't be set. Sadly, the APM doesn't describe the VMGEXIT behavior, nor does any of the SEV-ES documentation I have. I assume VMGEXIT is recognized at CPL>0 since it morphs to VMMCALL when SEV-ES isn't active. I.e. either the ghcb_cpl_is_valid() check should be nuked, or more likely KVM should do something like this (and then the guest needs to be updated to set the CPL on every VMGEXIT): diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index a9d8d6aafdb8..bb7251e4a3e2 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2058,7 +2058,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm) vcpu->arch.regs[VCPU_REGS_RDX] = ghcb_get_rdx_if_valid(ghcb); vcpu->arch.regs[VCPU_REGS_RSI] = ghcb_get_rsi_if_valid(ghcb); - svm->vmcb->save.cpl = ghcb_get_cpl_if_valid(ghcb); + svm->vmcb->save.cpl = 0; if (ghcb_xcr0_is_valid(ghcb)) { vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb); @@ -2088,6 +2088,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm) if (ghcb->ghcb_usage) goto vmgexit_err; + /* Ignore VMGEXIT at CPL>0 */ + if (!ghcb_cpl_is_valid(ghcb) || ghcb_get_cpl_if_valid(ghcb)) + return 1; + /* * Retrieve the exit code now even though is may not be marked valid * as it could help with debugging. @@ -2142,8 +2146,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm) } break; case SVM_EXIT_VMMCALL: - if (!ghcb_rax_is_valid(ghcb) || - !ghcb_cpl_is_valid(ghcb)) + if (!ghcb_rax_is_valid(ghcb)) goto vmgexit_err; break; case SVM_EXIT_RDTSCP: > I'm assuming it's meant to make live migration a bit easier? Peter, I forget, were these changes necessary for your work, or was the sole root cause the emulated MMIO bug in our backport? If KVM chugs along happily without these patches, I'd love to pivot and yank out all of the CR0/4/8 and EFER trapping/tracking, and then make KVM_GET_SREGS a nop as well.