On 5/10/21 11:10 AM, Sean Christopherson wrote: > On Fri, May 07, 2021, Tom Lendacky wrote: >> On 5/7/21 11:59 AM, Sean Christopherson wrote: >>> Allow userspace to set CR0, CR4, CR8, and EFER via KVM_SET_SREGS for >>> protected guests, e.g. for SEV-ES guests with an encrypted VMSA. KVM >>> tracks the aforementioned registers by trapping guest writes, and also >>> exposes the values to userspace via KVM_GET_SREGS. Skipping the regs >>> in KVM_SET_SREGS prevents userspace from updating KVM's CPU model to >>> match the known hardware state. >> >> This is very similar to the original patch I had proposed that you were >> against :) > > I hope/think my position was that it should be unnecessary for KVM to need to > know the guest's CR0/4/0 and EFER values, i.e. even the trapping is unnecessary. > I was going to say I had a change of heart, as EFER.LMA in particular could > still be required to identify 64-bit mode, but that's wrong; EFER.LMA only gets > us long mode, the full is_64_bit_mode() needs access to cs.L, which AFAICT isn't > provided by #VMGEXIT or trapping. Right, that one is missing. If you take a VMGEXIT that uses the GHCB, then I think you can assume we're in 64-bit mode. > > Unless I'm missing something, that means that VMGEXIT(VMMCALL) is broken since > KVM will incorrectly crush (or preserve) bits 63:32 of GPRs. I'm guessing no > one has reported a bug because either (a) no one has tested a hypercall that > requires bits 63:32 in a GPR or (b) the guest just happens to be in 64-bit mode > when KVM_SEV_LAUNCH_UPDATE_VMSA is invoked and so the segment registers are > frozen to make it appear as if the guest is perpetually in 64-bit mode. I don't think it's (b) since the LAUNCH_UPDATE_VMSA is done against reset- state vCPUs. > > I see that sev_es_validate_vmgexit() checks ghcb_cpl_is_valid(), but isn't that > either pointless or indicative of a much, much bigger problem? If VMGEXIT is It is needed for the VMMCALL exit. > restricted to CPL0, then the check is pointless. If VMGEXIT isn't restricted to > CPL0, then KVM has a big gaping hole that allows a malicious/broken guest > userspace to crash the VM simply by executing VMGEXIT. Since valid_bitmap is > cleared during VMGEXIT handling, I don't think guest userspace can attack/corrupt > the guest kernel by doing a replay attack, but it does all but guarantee a > VMGEXIT at CPL>0 will be fatal since the required valid bits won't be set. Right, so I think some cleanup is needed there, both for the guest and the hypervisor: - For the guest, we could just clear the valid bitmask before leaving the #VC handler/releasing the GHCB. Userspace can't update the GHCB, so any VMGEXIT from userspace would just look like a no-op with the below change to KVM. - For KVM, instead of returning -EINVAL from sev_es_validate_vmgexit(), we return the #GP action through the GHCB and continue running the guest. > > Sadly, the APM doesn't describe the VMGEXIT behavior, nor does any of the SEV-ES > documentation I have. I assume VMGEXIT is recognized at CPL>0 since it morphs > to VMMCALL when SEV-ES isn't active. Correct. > > I.e. either the ghcb_cpl_is_valid() check should be nuked, or more likely KVM The ghcb_cpl_is_valid() is still needed to see whether the VMMCALL was from userspace or not (a VMMCALL will generate a #VC). So maybe something like this instead (this is against the sev-es.c to sev.c rename): diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 432d937f8f1e..bf821a4eacf9 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -270,6 +270,7 @@ static __always_inline void sev_es_put_ghcb(struct ghcb_state *state) data->backup_ghcb_active = false; state->ghcb = NULL; } else { + vc_ghcb_invalidate(ghcb); data->ghcb_active = false; } } diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 17adc1e79136..3b40fd9dc895 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2564,7 +2564,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm) memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap)); } -static int sev_es_validate_vmgexit(struct vcpu_svm *svm) +static bool sev_es_validate_vmgexit(struct vcpu_svm *svm) { struct kvm_vcpu *vcpu; struct ghcb *ghcb; @@ -2670,7 +2670,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm) goto vmgexit_err; } - return 0; + return true; vmgexit_err: vcpu = &svm->vcpu; @@ -2684,13 +2684,16 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm) dump_ghcb(svm); } - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; - vcpu->run->internal.ndata = 2; - vcpu->run->internal.data[0] = exit_code; - vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; + /* Clear the valid entries fields */ + memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap)); - return -EINVAL; + ghcb_set_sw_exit_info_1(ghcb, 1); + ghcb_set_sw_exit_info_2(ghcb, + X86_TRAP_GP | + SVM_EVTINJ_TYPE_EXEPT | + SVM_EVTINJ_VALID); + + return false; } static void pre_sev_es_run(struct vcpu_svm *svm) @@ -3360,9 +3363,8 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu) exit_code = ghcb_get_sw_exit_code(ghcb); - ret = sev_es_validate_vmgexit(svm); - if (ret) - return ret; + if (!sev_es_validate_vmgexit(svm)) + return 1; sev_es_sync_from_ghcb(svm); ghcb_set_sw_exit_info_1(ghcb, 0); Thoughts? Thanks, Tom > should do something like this (and then the guest needs to be updated to set the > CPL on every VMGEXIT): > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index a9d8d6aafdb8..bb7251e4a3e2 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -2058,7 +2058,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm) > vcpu->arch.regs[VCPU_REGS_RDX] = ghcb_get_rdx_if_valid(ghcb); > vcpu->arch.regs[VCPU_REGS_RSI] = ghcb_get_rsi_if_valid(ghcb); > > - svm->vmcb->save.cpl = ghcb_get_cpl_if_valid(ghcb); > + svm->vmcb->save.cpl = 0; > > if (ghcb_xcr0_is_valid(ghcb)) { > vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb); > @@ -2088,6 +2088,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm) > if (ghcb->ghcb_usage) > goto vmgexit_err; > > + /* Ignore VMGEXIT at CPL>0 */ > + if (!ghcb_cpl_is_valid(ghcb) || ghcb_get_cpl_if_valid(ghcb)) > + return 1; > + > /* > * Retrieve the exit code now even though is may not be marked valid > * as it could help with debugging. > @@ -2142,8 +2146,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm) > } > break; > case SVM_EXIT_VMMCALL: > - if (!ghcb_rax_is_valid(ghcb) || > - !ghcb_cpl_is_valid(ghcb)) > + if (!ghcb_rax_is_valid(ghcb)) > goto vmgexit_err; > break; > case SVM_EXIT_RDTSCP: > >> I'm assuming it's meant to make live migration a bit easier? > > Peter, I forget, were these changes necessary for your work, or was the sole root > cause the emulated MMIO bug in our backport? > > If KVM chugs along happily without these patches, I'd love to pivot and yank out > all of the CR0/4/8 and EFER trapping/tracking, and then make KVM_GET_SREGS a nop > as well. >