Re: [PATCH 2/2] KVM: x86: Allow userspace to update tracked sregs for protected guests

Tom Lendacky <thomas.lendacky@xxxxxxx> · Mon, 10 May 2021 13:07:07 -0500

On 5/10/21 11:10 AM, Sean Christopherson wrote:
> On Fri, May 07, 2021, Tom Lendacky wrote:
>> On 5/7/21 11:59 AM, Sean Christopherson wrote:
>>> Allow userspace to set CR0, CR4, CR8, and EFER via KVM_SET_SREGS for
>>> protected guests, e.g. for SEV-ES guests with an encrypted VMSA.  KVM
>>> tracks the aforementioned registers by trapping guest writes, and also
>>> exposes the values to userspace via KVM_GET_SREGS.  Skipping the regs
>>> in KVM_SET_SREGS prevents userspace from updating KVM's CPU model to
>>> match the known hardware state.
>>
>> This is very similar to the original patch I had proposed that you were
>> against :)
> 
> I hope/think my position was that it should be unnecessary for KVM to need to
> know the guest's CR0/4/0 and EFER values, i.e. even the trapping is unnecessary.
> I was going to say I had a change of heart, as EFER.LMA in particular could
> still be required to identify 64-bit mode, but that's wrong; EFER.LMA only gets
> us long mode, the full is_64_bit_mode() needs access to cs.L, which AFAICT isn't
> provided by #VMGEXIT or trapping.

Right, that one is missing. If you take a VMGEXIT that uses the GHCB, then
I think you can assume we're in 64-bit mode.

> 
> Unless I'm missing something, that means that VMGEXIT(VMMCALL) is broken since
> KVM will incorrectly crush (or preserve) bits 63:32 of GPRs.  I'm guessing no
> one has reported a bug because either (a) no one has tested a hypercall that
> requires bits 63:32 in a GPR or (b) the guest just happens to be in 64-bit mode
> when KVM_SEV_LAUNCH_UPDATE_VMSA is invoked and so the segment registers are
> frozen to make it appear as if the guest is perpetually in 64-bit mode.

I don't think it's (b) since the LAUNCH_UPDATE_VMSA is done against reset-
state vCPUs.

> 
> I see that sev_es_validate_vmgexit() checks ghcb_cpl_is_valid(), but isn't that
> either pointless or indicative of a much, much bigger problem?  If VMGEXIT is

It is needed for the VMMCALL exit.

> restricted to CPL0, then the check is pointless.  If VMGEXIT isn't restricted to
> CPL0, then KVM has a big gaping hole that allows a malicious/broken guest
> userspace to crash the VM simply by executing VMGEXIT.  Since valid_bitmap is
> cleared during VMGEXIT handling, I don't think guest userspace can attack/corrupt
> the guest kernel by doing a replay attack, but it does all but guarantee a
> VMGEXIT at CPL>0 will be fatal since the required valid bits won't be set.

Right, so I think some cleanup is needed there, both for the guest and the
hypervisor:

- For the guest, we could just clear the valid bitmask before leaving the
  #VC handler/releasing the GHCB. Userspace can't update the GHCB, so any
  VMGEXIT from userspace would just look like a no-op with the below
  change to KVM.

- For KVM, instead of returning -EINVAL from sev_es_validate_vmgexit(), we
  return the #GP action through the GHCB and continue running the guest.

> 
> Sadly, the APM doesn't describe the VMGEXIT behavior, nor does any of the SEV-ES
> documentation I have.  I assume VMGEXIT is recognized at CPL>0 since it morphs
> to VMMCALL when SEV-ES isn't active.

Correct.

> 
> I.e. either the ghcb_cpl_is_valid() check should be nuked, or more likely KVM

The ghcb_cpl_is_valid() is still needed to see whether the VMMCALL was
from userspace or not (a VMMCALL will generate a #VC). So maybe something
like this instead (this is against the sev-es.c to sev.c rename):

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 432d937f8f1e..bf821a4eacf9 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -270,6 +270,7 @@ static __always_inline void sev_es_put_ghcb(struct ghcb_state *state)
 		data->backup_ghcb_active = false;
 		state->ghcb = NULL;
 	} else {
+		vc_ghcb_invalidate(ghcb);
 		data->ghcb_active = false;
 	}
 }
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 17adc1e79136..3b40fd9dc895 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2564,7 +2564,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 	memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
 }
 
-static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
+static bool sev_es_validate_vmgexit(struct vcpu_svm *svm)
 {
 	struct kvm_vcpu *vcpu;
 	struct ghcb *ghcb;
@@ -2670,7 +2670,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		goto vmgexit_err;
 	}
 
-	return 0;
+	return true;
 
 vmgexit_err:
 	vcpu = &svm->vcpu;
@@ -2684,13 +2684,16 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		dump_ghcb(svm);
 	}
 
-	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
-	vcpu->run->internal.ndata = 2;
-	vcpu->run->internal.data[0] = exit_code;
-	vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
+	/* Clear the valid entries fields */
+	memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
 
-	return -EINVAL;
+	ghcb_set_sw_exit_info_1(ghcb, 1);
+	ghcb_set_sw_exit_info_2(ghcb,
+				X86_TRAP_GP |
+				SVM_EVTINJ_TYPE_EXEPT |
+				SVM_EVTINJ_VALID);
+
+	return false;
 }
 
 static void pre_sev_es_run(struct vcpu_svm *svm)
@@ -3360,9 +3363,8 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 
 	exit_code = ghcb_get_sw_exit_code(ghcb);
 
-	ret = sev_es_validate_vmgexit(svm);
-	if (ret)
-		return ret;
+	if (!sev_es_validate_vmgexit(svm))
+		return 1;
 
 	sev_es_sync_from_ghcb(svm);
 	ghcb_set_sw_exit_info_1(ghcb, 0);

Thoughts?

Thanks,
Tom

> should do something like this (and then the guest needs to be updated to set the
> CPL on every VMGEXIT):
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a9d8d6aafdb8..bb7251e4a3e2 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2058,7 +2058,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
>         vcpu->arch.regs[VCPU_REGS_RDX] = ghcb_get_rdx_if_valid(ghcb);
>         vcpu->arch.regs[VCPU_REGS_RSI] = ghcb_get_rsi_if_valid(ghcb);
> 
> -       svm->vmcb->save.cpl = ghcb_get_cpl_if_valid(ghcb);
> +       svm->vmcb->save.cpl = 0;
> 
>         if (ghcb_xcr0_is_valid(ghcb)) {
>                 vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> @@ -2088,6 +2088,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>         if (ghcb->ghcb_usage)
>                 goto vmgexit_err;
> 
> +       /* Ignore VMGEXIT at CPL>0 */
> +       if (!ghcb_cpl_is_valid(ghcb) || ghcb_get_cpl_if_valid(ghcb))
> +               return 1;
> +
>         /*
>          * Retrieve the exit code now even though is may not be marked valid
>          * as it could help with debugging.
> @@ -2142,8 +2146,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>                 }
>                 break;
>         case SVM_EXIT_VMMCALL:
> -               if (!ghcb_rax_is_valid(ghcb) ||
> -                   !ghcb_cpl_is_valid(ghcb))
> +               if (!ghcb_rax_is_valid(ghcb))
>                         goto vmgexit_err;
>                 break;
>         case SVM_EXIT_RDTSCP:
> 
>> I'm assuming it's meant to make live migration a bit easier?
> 
> Peter, I forget, were these changes necessary for your work, or was the sole root
> cause the emulated MMIO bug in our backport?
> 
> If KVM chugs along happily without these patches, I'd love to pivot and yank out
> all of the CR0/4/8 and EFER trapping/tracking, and then make KVM_GET_SREGS a nop
> as well.
>